This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
-
UnrollLoop.h
-
lib/Transforms/
-
Transforms/
-
Scalar/
1
LoopUnrollPass.cpp
-
Utils/
-
LoopUnroll.cpp
-
LoopUnrollRuntime.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
peel-loop-and-unroll.ll
-
pr33437.ll
-
pr45939-peel-count-and-complete-unroll.ll
-
wrong_assert_in_peeling.ll

Differential D103362

[LoopUnroll] Separate peeling from unrolling
ClosedPublic

Authored by nikic on May 29 2021, 10:00 AM.

Download Raw Diff

Details

Reviewers

reames
fhahn
efriedma
lebedev.ri

Commits

rGdb45746821ab: [LoopUnroll] Separate peeling from unrolling

Summary

Loop peeling is currently performed as part of UnrollLoop(). The whole setup is somewhat odd, here is my current understanding of the situation:

Outside test scenarios, peeling will always be performed as part of an unroll with Count=1: https://github.com/llvm/llvm-project/blob/ffb48d48e45c72ed81dda4983ccb06e800cdbbd0/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp#L873-L879
Of course, unrolling with count one does not do a great deal. My deduced reason for doing this is to run simplifyLoopAfterUnroll(), without which many tests indeed fail.
When testing, it's possible to specify both -unroll-peel-count and -unroll-count, in which case both peeling and unrolling will be performed. However, if you actually try to do this, you apparently run into this miscompile: https://bugs.llvm.org/show_bug.cgi?id=45939 / D80080

Based on that reasoning (and suggested by @efriedma on the referenced review) I've moved the peeling code out of UnrollLoop() into tryToUnrollLoop(). An error is thrown if a test explicitly requests to perform both peeling and unrolling.

TBH the way peeling is implemented has left me very confused, so I'm not sure I'm doing the right thing here.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

nikic created this revision.May 29 2021, 10:00 AM

Herald added subscribers: zzheng, hiraditya. · View Herald TranscriptMay 29 2021, 10:00 AM

nikic requested review of this revision.May 29 2021, 10:00 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 29 2021, 10:00 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

This doesn't make sense to me.
So now if we ask to unroll loop, and peel stuff first, it will only do the peeling?
The tests even show as much.

This revision now requires changes to proceed.May 29 2021, 10:05 AM

Harbormaster completed remote builds in B106803: Diff 348641.May 29 2021, 10:40 AM

Yes, silently ignoring one option isn't good. I've changed the code to emit an error if both options are specified.

Harbormaster completed remote builds in B106812: Diff 348653.May 29 2021, 12:10 PM

I was actually talking about not the explicit options, but how this works in the pipeline,
and i guess this somehow does not affect that use-case at all.
IMO that becomes even more confusing, but since this doesn't regress pipelines, i don't care.

In D103362#2788792, @lebedev.ri wrote:

I was actually talking about not the explicit options, but how this works in the pipeline,
and i guess this somehow does not affect that use-case at all.
IMO that becomes even more confusing, but since this doesn't regress pipelines, i don't care.

Yes, this does not affect the optimization pipeline, it only affects tests that specify explicit unroll counts.

I think it is worth noting that there are cases where performing both (non-PGO) peeling and unrolling could be sensible, namely when invariance peeling renders a previously non-analyzable loop analyzable. However, this requires that we first perform a peel and then re-analyze the loop for unrolling profitability from scratch, not that we perform a combined peel and unroll in a single step. As such, even if we want to support both peeling and unrolling a loop in the future, I believe this patch would still be the necessary first step in that direction.

I'm not sure we can drop the behavior of both peeling and unrolling. The tests use the command lines, but I believe there's a pragma/metadata mechanism which can be used to achieve the same effect. (Vague memory of prior conversations w/ @Meinersbur)

Supporting the existing behavior with the new code structure doesn't seem too hard, would you mind updating to continue supporting both? (Well, while fixing the miscompile noted in D103620.)

In D103362#2796508, @reames wrote:

I'm not sure we can drop the behavior of both peeling and unrolling. The tests use the command lines, but I believe there's a pragma/metadata mechanism which can be used to achieve the same effect. (Vague memory of prior conversations w/ @Meinersbur)

While there is indeed metadata to set an unroll count, I don't believe there is metadata to set a peel count. If an unroll count is set via metadata, we will not attempt to compute a peel count.

Supporting the existing behavior with the new code structure doesn't seem too hard, would you mind updating to continue supporting both? (Well, while fixing the miscompile noted in D103620.)

I don't think the existing behavior makes sense, so I would prefer not to preserve it. It seems pointless to go out of our way to allow a scenario in testing that cannot appear as part of the optimization pipeline.

nikic planned changes to this revision.Jun 4 2021, 12:12 PM

@nikic I found your last comment convincing.

LGTM

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
1189	In the future, we really should pull out a function with the other loop simplification, and use that here, but that's for the future.

If an explicit PeelCount has been provided for testing, use it directly and don't try to go through other unroll heuristics. Otherwise we might try to use the explicit test PeelCount together with a full unroll heuristic. (Once again, this doesn't affect the optimization pipeline, just testing.)

Also preserve existing tests by removing explicit unroll counts from tests that already have an explicit peel count. The tests are still reasonable apart from that.

This revision is now accepted and ready to land.Jun 4 2021, 1:14 PM

Harbormaster completed remote builds in B107734: Diff 349952.Jun 4 2021, 2:08 PM

LGTM still holds.

Closed by commit rGdb45746821ab: [LoopUnroll] Separate peeling from unrolling (authored by nikic). · Explain WhyJun 5 2021, 1:32 AM

This revision was automatically updated to reflect the committed changes.

nikic added a commit: rGdb45746821ab: [LoopUnroll] Separate peeling from unrolling.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

UnrollLoop.h

1 line

lib/

Transforms/

Scalar/

LoopUnrollPass.cpp

41 lines

Utils/

LoopUnroll.cpp

46 lines

LoopUnrollRuntime.cpp

2 lines

test/

Transforms/

LoopUnroll/

peel-loop-and-unroll.ll

22 lines

pr33437.ll

24 lines

pr45939-peel-count-and-complete-unroll.ll

131 lines

wrong_assert_in_peeling.ll

6 lines

Diff 350040

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines

	struct UnrollLoopOptions {			struct UnrollLoopOptions {
	unsigned Count;			unsigned Count;
	unsigned TripCount;			unsigned TripCount;
	bool Force;			bool Force;
	bool AllowRuntime;			bool AllowRuntime;
	bool AllowExpensiveTripCount;			bool AllowExpensiveTripCount;
	unsigned TripMultiple;			unsigned TripMultiple;
	unsigned PeelCount;
	bool UnrollRemainder;			bool UnrollRemainder;
	bool ForgetAllSCEV;			bool ForgetAllSCEV;
	};			};

	LoopUnrollResult UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,			LoopUnrollResult UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
	ScalarEvolution SE, DominatorTree DT,			ScalarEvolution SE, DominatorTree DT,
	AssumptionCache *AC,			AssumptionCache *AC,
	const llvm::TargetTransformInfo *TTI,			const llvm::TargetTransformInfo *TTI,
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

Show First 20 Lines • Show All 765 Lines • ▼ Show 20 Lines	bool llvm::computeUnrollCount(
ScalarEvolution &SE, const SmallPtrSetImpl<const Value *> &EphValues,		ScalarEvolution &SE, const SmallPtrSetImpl<const Value *> &EphValues,
OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount,		OptimizationRemarkEmitter *ORE, unsigned &TripCount, unsigned MaxTripCount,
bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize,		bool MaxOrZero, unsigned &TripMultiple, unsigned LoopSize,
TargetTransformInfo::UnrollingPreferences &UP,		TargetTransformInfo::UnrollingPreferences &UP,
TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) {		TargetTransformInfo::PeelingPreferences &PP, bool &UseUpperBound) {

UnrollCostEstimator UCE(*L, LoopSize);		UnrollCostEstimator UCE(*L, LoopSize);

		// Use an explicit peel count that has been specified for testing. In this
		// case it's not permitted to also specify an explicit unroll count.
		if (PP.PeelCount) {
		if (UnrollCount.getNumOccurrences() > 0) {
		report_fatal_error("Cannot specify both explicit peel count and "
		"explicit unroll count");
		}
		UP.Count = 1;
		UP.Runtime = false;
		return true;
		}

// Check for explicit Count.		// Check for explicit Count.
// 1st priority is unroll count set by "unroll-count" option.		// 1st priority is unroll count set by "unroll-count" option.
bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;		bool UserUnrollCount = UnrollCount.getNumOccurrences() > 0;
if (UserUnrollCount) {		if (UserUnrollCount) {
UP.Count = UnrollCount;		UP.Count = UnrollCount;
UP.AllowExpensiveTripCount = true;		UP.AllowExpensiveTripCount = true;
UP.Force = true;		UP.Force = true;
if (UP.AllowRemainder && UCE.getUnrolledLoopSize(UP) < UP.Threshold)		if (UP.AllowRemainder && UCE.getUnrolledLoopSize(UP) < UP.Threshold)
▲ Show 20 Lines • Show All 371 Lines • ▼ Show 20 Lines	bool IsCountSetExplicitly = computeUnrollCount(
L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero,		L, TTI, DT, LI, SE, EphValues, &ORE, TripCount, MaxTripCount, MaxOrZero,
TripMultiple, LoopSize, UP, PP, UseUpperBound);		TripMultiple, LoopSize, UP, PP, UseUpperBound);
if (!UP.Count)		if (!UP.Count)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
// Unroll factor (Count) must be less or equal to TripCount.		// Unroll factor (Count) must be less or equal to TripCount.
if (TripCount && UP.Count > TripCount)		if (TripCount && UP.Count > TripCount)
UP.Count = TripCount;		UP.Count = TripCount;

		if (PP.PeelCount) {
		assert(UP.Count == 1 && "Cannot perform peel and unroll in the same step");
		LLVM_DEBUG(dbgs() << "PEELING loop %" << L->getHeader()->getName()
		<< " with iteration count " << PP.PeelCount << "!\n");
		ORE.emit([&]() {
		return OptimizationRemark(DEBUG_TYPE, "Peeled", L->getStartLoc(),
		L->getHeader())
		<< " peeled loop by " << ore::NV("PeelCount", PP.PeelCount)
		<< " iterations";
		});

		if (peelLoop(L, PP.PeelCount, LI, &SE, &DT, &AC, PreserveLCSSA)) {
		simplifyLoopAfterUnroll(L, true, LI, &SE, &DT, &AC, &TTI);
		// If the loop was peeled, we already "used up" the profile information
		// we had, so we don't want to unroll or peel again.
		if (PP.PeelProfiledIterations)
		L->setLoopAlreadyUnrolled();
		reamesUnsubmitted Not Done Reply Inline Actions In the future, we really should pull out a function with the other loop simplification, and use that here, but that's for the future. reames: In the future, we really should pull out a function with the other loop simplification, and use…
		return LoopUnrollResult::PartiallyUnrolled;
		}
		return LoopUnrollResult::Unmodified;
		}

// Save loop properties before it is transformed.		// Save loop properties before it is transformed.
MDNode *OrigLoopID = L->getLoopID();		MDNode *OrigLoopID = L->getLoopID();

// Unroll the loop.		// Unroll the loop.
Loop *RemainderLoop = nullptr;		Loop *RemainderLoop = nullptr;
LoopUnrollResult UnrollResult = UnrollLoop(		LoopUnrollResult UnrollResult = UnrollLoop(
L,		L,
{UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,		{UP.Count, TripCount, UP.Force, UP.Runtime, UP.AllowExpensiveTripCount,
TripMultiple, PP.PeelCount, UP.UnrollRemainder, ForgetAllSCEV},		TripMultiple, UP.UnrollRemainder, ForgetAllSCEV},
LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop);		LI, &SE, &DT, &AC, &TTI, &ORE, PreserveLCSSA, &RemainderLoop);
if (UnrollResult == LoopUnrollResult::Unmodified)		if (UnrollResult == LoopUnrollResult::Unmodified)
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;

if (RemainderLoop) {		if (RemainderLoop) {
Optional<MDNode *> RemainderLoopID =		Optional<MDNode *> RemainderLoopID =
makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,		makeFollowupLoopID(OrigLoopID, {LLVMLoopUnrollFollowupAll,
LLVMLoopUnrollFollowupRemainder});		LLVMLoopUnrollFollowupRemainder});
Show All 11 Lines	if (NewLoopID.hasValue()) {
// Do not setLoopAlreadyUnrolled if loop attributes have been specified		// Do not setLoopAlreadyUnrolled if loop attributes have been specified
// explicitly.		// explicitly.
return UnrollResult;		return UnrollResult;
}		}
}		}

// If loop has an unroll count pragma or unrolled by explicitly set count		// If loop has an unroll count pragma or unrolled by explicitly set count
// mark loop as unrolled to prevent unrolling beyond that requested.		// mark loop as unrolled to prevent unrolling beyond that requested.
// If the loop was peeled, we already "used up" the profile information		if (UnrollResult != LoopUnrollResult::FullyUnrolled && IsCountSetExplicitly)
// we had, so we don't want to unroll or peel again.
if (UnrollResult != LoopUnrollResult::FullyUnrolled &&
(IsCountSetExplicitly \|\| (PP.PeelProfiledIterations && PP.PeelCount)))
L->setLoopAlreadyUnrolled();		L->setLoopAlreadyUnrolled();

return UnrollResult;		return UnrollResult;
}		}

namespace {		namespace {

class LoopUnroll : public LoopPass {		class LoopUnroll : public LoopPass {
▲ Show 20 Lines • Show All 283 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnroll.cpp

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/GenericDomTree.h"		#include "llvm/Support/GenericDomTree.h"
#include "llvm/Support/MathExtras.h"		#include "llvm/Support/MathExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
#include "llvm/Transforms/Utils/Cloning.h"		#include "llvm/Transforms/Utils/Cloning.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"
#include "llvm/Transforms/Utils/LoopPeel.h"
#include "llvm/Transforms/Utils/LoopSimplify.h"		#include "llvm/Transforms/Utils/LoopSimplify.h"
#include "llvm/Transforms/Utils/LoopUtils.h"		#include "llvm/Transforms/Utils/LoopUtils.h"
#include "llvm/Transforms/Utils/SimplifyIndVar.h"		#include "llvm/Transforms/Utils/SimplifyIndVar.h"
#include "llvm/Transforms/Utils/UnrollLoop.h"		#include "llvm/Transforms/Utils/UnrollLoop.h"
#include "llvm/Transforms/Utils/ValueMapper.h"		#include "llvm/Transforms/Utils/ValueMapper.h"
#include <algorithm>		#include <algorithm>
#include <assert.h>		#include <assert.h>
#include <type_traits>		#include <type_traits>
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines
///		///
/// If AllowRuntime is true then UnrollLoop will consider unrolling loops that		/// If AllowRuntime is true then UnrollLoop will consider unrolling loops that
/// have a runtime (i.e. not compile time constant) trip count. Unrolling these		/// have a runtime (i.e. not compile time constant) trip count. Unrolling these
/// loops require a unroll "prologue" that runs "RuntimeTripCount % Count"		/// loops require a unroll "prologue" that runs "RuntimeTripCount % Count"
/// iterations before branching into the unrolled loop. UnrollLoop will not		/// iterations before branching into the unrolled loop. UnrollLoop will not
/// runtime-unroll the loop if computing RuntimeTripCount will be expensive and		/// runtime-unroll the loop if computing RuntimeTripCount will be expensive and
/// AllowExpensiveTripCount is false.		/// AllowExpensiveTripCount is false.
///		///
/// If we want to perform PGO-based loop peeling, PeelCount is set to the
/// number of iterations we want to peel off.
///
/// The LoopInfo Analysis that is passed will be kept consistent.		/// The LoopInfo Analysis that is passed will be kept consistent.
///		///
/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and		/// This utility preserves LoopInfo. It will also preserve ScalarEvolution and
/// DominatorTree if they are non-null.		/// DominatorTree if they are non-null.
///		///
/// If RemainderLoop is non-null, it will receive the remainder loop (if		/// If RemainderLoop is non-null, it will receive the remainder loop (if
/// required and not fully unrolled).		/// required and not fully unrolled).
LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,		LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
Show All 33 Lines	if (ULO.TripMultiple != 1)
LLVM_DEBUG(dbgs() << " Trip Multiple = " << ULO.TripMultiple << "\n");		LLVM_DEBUG(dbgs() << " Trip Multiple = " << ULO.TripMultiple << "\n");

// Effectively "DCE" unrolled iterations that are beyond the tripcount		// Effectively "DCE" unrolled iterations that are beyond the tripcount
// and will never be executed.		// and will never be executed.
if (ULO.TripCount != 0 && ULO.Count > ULO.TripCount)		if (ULO.TripCount != 0 && ULO.Count > ULO.TripCount)
ULO.Count = ULO.TripCount;		ULO.Count = ULO.TripCount;

// Don't enter the unroll code if there is nothing to do.		// Don't enter the unroll code if there is nothing to do.
if (ULO.TripCount == 0 && ULO.Count < 2 && ULO.PeelCount == 0) {		if (ULO.TripCount == 0 && ULO.Count < 2) {
LLVM_DEBUG(dbgs() << "Won't unroll; almost nothing to do\n");		LLVM_DEBUG(dbgs() << "Won't unroll; almost nothing to do\n");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}

assert(ULO.Count > 0);		assert(ULO.Count > 0);
assert(ULO.TripMultiple > 0);		assert(ULO.TripMultiple > 0);
assert(ULO.TripCount == 0 \|\| ULO.TripCount % ULO.TripMultiple == 0);		assert(ULO.TripCount == 0 \|\| ULO.TripCount % ULO.TripMultiple == 0);


bool Peeled = false;
if (ULO.PeelCount) {
Peeled = peelLoop(L, ULO.PeelCount, LI, SE, DT, AC, PreserveLCSSA);

// Successful peeling may result in a change in the loop preheader/trip
// counts. If we later unroll the loop, we want these to be updated.
if (Peeled) {
// According to our guards and profitability checks the only
// meaningful exit should be latch block. Other exits go to deopt,
// so we do not worry about them.
BasicBlock *ExitingBlock = L->getLoopLatch();
assert(ExitingBlock && "Loop without exiting block?");
assert(L->isLoopExiting(ExitingBlock) && "Latch is not exiting?");
ULO.TripCount = SE->getSmallConstantTripCount(L, ExitingBlock);
ULO.TripMultiple = SE->getSmallConstantTripMultiple(L, ExitingBlock);
}
}

// Are we eliminating the loop control altogether? Note that we can know		// Are we eliminating the loop control altogether? Note that we can know
// we're eliminating the backedge without knowing exactly which iteration		// we're eliminating the backedge without knowing exactly which iteration
// of the unrolled body exits.		// of the unrolled body exits.
const bool CompletelyUnroll = ULO.Count == ULO.TripCount;		const bool CompletelyUnroll = ULO.Count == ULO.TripCount;

// We assume a run-time trip count if the compiler cannot		// We assume a run-time trip count if the compiler cannot
// figure out the loop trip count and the unroll-runtime		// figure out the loop trip count and the unroll-runtime
// flag is specified.		// flag is specified.
bool RuntimeTripCount =		bool RuntimeTripCount =
(ULO.TripCount == 0 && ULO.Count > 0 && ULO.AllowRuntime);		(ULO.TripCount == 0 && ULO.Count > 0 && ULO.AllowRuntime);

assert((!RuntimeTripCount \|\| !ULO.PeelCount) &&
"Did not expect runtime trip-count unrolling "
"and peeling for the same loop");

// All these values should be taken only after peeling because they might have		// All these values should be taken only after peeling because they might have
// changed.		// changed.
BasicBlock *Preheader = L->getLoopPreheader();		BasicBlock *Preheader = L->getLoopPreheader();
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *LatchBlock = L->getLoopLatch();		BasicBlock *LatchBlock = L->getLoopLatch();
SmallVector<BasicBlock *, 4> ExitBlocks;		SmallVector<BasicBlock *, 4> ExitBlocks;
L->getExitBlocks(ExitBlocks);		L->getExitBlocks(ExitBlocks);
std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();		std::vector<BasicBlock *> OriginalLoopBlocks = L->getBlocks();
Show All 26 Lines	LoopUnrollResult llvm::UnrollLoop(Loop L, UnrollLoopOptions ULO, LoopInfo LI,
// unconditional branch in the unrolled loop in some cases.		// unconditional branch in the unrolled loop in some cases.
BranchInst *ExitingBI = nullptr;		BranchInst *ExitingBI = nullptr;
bool LatchIsExiting = L->isLoopExiting(LatchBlock);		bool LatchIsExiting = L->isLoopExiting(LatchBlock);
if (LatchIsExiting)		if (LatchIsExiting)
ExitingBI = LatchBI;		ExitingBI = LatchBI;
else if (BasicBlock *ExitingBlock = L->getExitingBlock())		else if (BasicBlock *ExitingBlock = L->getExitingBlock())
ExitingBI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());		ExitingBI = dyn_cast<BranchInst>(ExitingBlock->getTerminator());
if (!LatchBI \|\| (LatchBI->isConditional() && !LatchIsExiting)) {		if (!LatchBI \|\| (LatchBI->isConditional() && !LatchIsExiting)) {
// If the peeling guard is changed this assert may be relaxed or even
// deleted.
assert(!Peeled && "Peeling guard changed!");
LLVM_DEBUG(		LLVM_DEBUG(
dbgs() << "Can't unroll; a conditional latch must exit the loop");		dbgs() << "Can't unroll; a conditional latch must exit the loop");
return LoopUnrollResult::Unmodified;		return LoopUnrollResult::Unmodified;
}		}
LLVM_DEBUG({		LLVM_DEBUG({
if (ExitingBI)		if (ExitingBI)
dbgs() << " Exiting Block = " << ExitingBI->getParent()->getName()		dbgs() << " Exiting Block = " << ExitingBI->getParent()->getName()
<< "\n";		<< "\n";
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	LLVM_DEBUG(dbgs() << "COMPLETELY UNROLLING loop %" << Header->getName()
<< " with trip count " << ULO.TripCount << "!\n");		<< " with trip count " << ULO.TripCount << "!\n");
if (ORE)		if (ORE)
ORE->emit([&]() {		ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "FullyUnrolled", L->getStartLoc(),		return OptimizationRemark(DEBUG_TYPE, "FullyUnrolled", L->getStartLoc(),
L->getHeader())		L->getHeader())
<< "completely unrolled loop with "		<< "completely unrolled loop with "
<< NV("UnrollCount", ULO.TripCount) << " iterations";		<< NV("UnrollCount", ULO.TripCount) << " iterations";
});		});
} else if (ULO.PeelCount) {
LLVM_DEBUG(dbgs() << "PEELING loop %" << Header->getName()
<< " with iteration count " << ULO.PeelCount << "!\n");
if (ORE)
ORE->emit([&]() {
return OptimizationRemark(DEBUG_TYPE, "Peeled", L->getStartLoc(),
L->getHeader())
<< " peeled loop by " << NV("PeelCount", ULO.PeelCount)
<< " iterations";
});
} else {		} else {
auto DiagBuilder = [&]() {		auto DiagBuilder = [&]() {
OptimizationRemark Diag(DEBUG_TYPE, "PartialUnrolled", L->getStartLoc(),		OptimizationRemark Diag(DEBUG_TYPE, "PartialUnrolled", L->getStartLoc(),
L->getHeader());		L->getHeader());
return Diag << "unrolled loop by a factor of "		return Diag << "unrolled loop by a factor of "
<< NV("UnrollCount", ULO.Count);		<< NV("UnrollCount", ULO.Count);
};		};

▲ Show 20 Lines • Show All 336 Lines • ▼ Show 20 Lines	if (Term && Term->isUnconditional()) {
}		}
}		}
}		}
// Apply updates to the DomTree.		// Apply updates to the DomTree.
DT = &DTU.getDomTree();		DT = &DTU.getDomTree();

// At this point, the code is well formed. We now simplify the unrolled loop,		// At this point, the code is well formed. We now simplify the unrolled loop,
// doing constant propagation and dead code elimination as we go.		// doing constant propagation and dead code elimination as we go.
simplifyLoopAfterUnroll(L, !CompletelyUnroll && (ULO.Count > 1 \|\| Peeled), LI,		simplifyLoopAfterUnroll(L, !CompletelyUnroll && ULO.Count > 1, LI, SE, DT, AC,
SE, DT, AC, TTI);		TTI);

NumCompletelyUnrolled += CompletelyUnroll;		NumCompletelyUnrolled += CompletelyUnroll;
++NumUnrolled;		++NumUnrolled;

Loop *OuterL = L->getParentLoop();		Loop *OuterL = L->getParentLoop();
// Update LoopInfo if the loop is completely removed.		// Update LoopInfo if the loop is completely removed.
if (CompletelyUnroll)		if (CompletelyUnroll)
LI->erase(L);		LI->erase(L);
▲ Show 20 Lines • Show All 70 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 981 Lines • ▼ Show 20 Lines	#endif
auto UnrollResult = LoopUnrollResult::Unmodified;		auto UnrollResult = LoopUnrollResult::Unmodified;
if (remainderLoop && UnrollRemainder) {		if (remainderLoop && UnrollRemainder) {
LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");		LLVM_DEBUG(dbgs() << "Unrolling remainder loop\n");
UnrollResult =		UnrollResult =
UnrollLoop(remainderLoop,		UnrollLoop(remainderLoop,
{/Count/ Count - 1, /TripCount/ Count - 1,		{/Count/ Count - 1, /TripCount/ Count - 1,
/Force/ false, /AllowRuntime/ false,		/Force/ false, /AllowRuntime/ false,
/AllowExpensiveTripCount/ false, /TripMultiple/ 1,		/AllowExpensiveTripCount/ false, /TripMultiple/ 1,
/PeelCount/ 0, /UnrollRemainder/ false, ForgetAllSCEV},		/UnrollRemainder/ false, ForgetAllSCEV},
LI, SE, DT, AC, TTI, /ORE/ nullptr, PreserveLCSSA);		LI, SE, DT, AC, TTI, /ORE/ nullptr, PreserveLCSSA);
}		}

if (ResultLoop && UnrollResult != LoopUnrollResult::FullyUnrolled)		if (ResultLoop && UnrollResult != LoopUnrollResult::FullyUnrolled)
*ResultLoop = remainderLoop;		*ResultLoop = remainderLoop;
NumRuntimeUnrolled++;		NumRuntimeUnrolled++;
return true;		return true;
}		}

llvm/test/Transforms/LoopUnroll/peel-loop-and-unroll.ll

This file was added.

				; RUN: not --crash opt -loop-unroll -unroll-peel-count=2 -unroll-count=2 -S < %s 2>&1 \| FileCheck %s

				; CHECK: LLVM ERROR: Cannot specify both explicit peel count and explicit unroll count

				@a = global [8 x i32] zeroinitializer, align 16

				define void @test1() {
				entry:
				br label %for.body

				for.body: ; preds = %entry, %for.body
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 %indvars.iv
				%0 = trunc i64 %indvars.iv to i32
				store i32 %0, i32* %arrayidx, align 4
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp ne i64 %indvars.iv.next, 8
				br i1 %exitcond, label %for.body, label %for.exit

				for.exit: ; preds = %for.body
				ret void
				}

llvm/test/Transforms/LoopUnroll/pr33437.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -S -loop-unroll -unroll-count=4 -unroll-peel-count=1 < %s \| FileCheck %s			; RUN: opt -S -loop-unroll -unroll-peel-count=1 < %s \| FileCheck %s

	declare zeroext i8 @patatino()			declare zeroext i8 @patatino()

	define fastcc void @tinky() {			define fastcc void @tinky() {
	; CHECK-LABEL: @tinky(			; CHECK-LABEL: @tinky(
	; CHECK-NEXT: entry:			; CHECK-NEXT: entry:
	; CHECK-NEXT: br label [[NEXT:%.*]]			; CHECK-NEXT: br label [[NEXT:%.*]]
				; CHECK: loopexit.loopexit:
				; CHECK-NEXT: br label [[LOOPEXIT:%.*]]
	; CHECK: loopexit:			; CHECK: loopexit:
	; CHECK-NEXT: ret void			; CHECK-NEXT: ret void
	; CHECK: next:			; CHECK: next:
	; CHECK-NEXT: br label [[LOOP_PEEL_BEGIN:%.*]]			; CHECK-NEXT: br label [[LOOP_PEEL_BEGIN:%.*]]
	; CHECK: loop.peel.begin:			; CHECK: loop.peel.begin:
	; CHECK-NEXT: br label [[LOOP_PEEL:%.*]]			; CHECK-NEXT: br label [[LOOP_PEEL:%.*]]
	; CHECK: loop.peel:			; CHECK: loop.peel:
	; CHECK-NEXT: [[CALL593_PEEL:%.*]] = tail call zeroext i8 @patatino()			; CHECK-NEXT: [[CALL593_PEEL:%.*]] = tail call zeroext i8 @patatino()
	; CHECK-NEXT: br i1 false, label [[LOOP_PEEL_NEXT:%.]], label [[LOOPEXIT:%.]]			; CHECK-NEXT: br i1 false, label [[LOOP_PEEL_NEXT:%.*]], label [[LOOPEXIT]]
	; CHECK: loop.peel.next:			; CHECK: loop.peel.next:
	; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]			; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
	; CHECK: loop.peel.next1:			; CHECK: loop.peel.next1:
	; CHECK-NEXT: br label [[NEXT_PEEL_NEWPH:%.*]]			; CHECK-NEXT: br label [[NEXT_PEEL_NEWPH:%.*]]
	; CHECK: next.peel.newph:			; CHECK: next.peel.newph:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[CALL593:%.*]] = tail call zeroext i8 @patatino()			; CHECK-NEXT: [[CALL593:%.*]] = tail call zeroext i8 @patatino()
	; CHECK-NEXT: br label [[LOOPEXIT]]			; CHECK-NEXT: br i1 false, label [[LOOP]], label [[LOOPEXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	;			;
	entry:			entry:
	br label %next			br label %next

	loopexit:			loopexit:
	ret void			ret void

	next:			next:
	Show All 25 Lines
	; CHECK-NEXT: br i1 [[COND_PEEL]], label [[LOOP_PEEL_NEXT:%.*]], label [[LOOPEXIT]]			; CHECK-NEXT: br i1 [[COND_PEEL]], label [[LOOP_PEEL_NEXT:%.*]], label [[LOOPEXIT]]
	; CHECK: loop.peel.next:			; CHECK: loop.peel.next:
	; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]			; CHECK-NEXT: br label [[LOOP_PEEL_NEXT1:%.*]]
	; CHECK: loop.peel.next1:			; CHECK: loop.peel.next1:
	; CHECK-NEXT: br label [[NEXT_PEEL_NEWPH:%.*]]			; CHECK-NEXT: br label [[NEXT_PEEL_NEWPH:%.*]]
	; CHECK: next.peel.newph:			; CHECK: next.peel.newph:
	; CHECK-NEXT: br label [[LOOP:%.*]]			; CHECK-NEXT: br label [[LOOP:%.*]]
	; CHECK: loop:			; CHECK: loop:
	; CHECK-NEXT: [[B:%.]] = phi i32 [ [[B_NEXT_PEEL]], [[NEXT_PEEL_NEWPH]] ], [ [[B_NEXT_3:%.]], [[LOOP_2:%.*]] ]			; CHECK-NEXT: [[B:%.]] = phi i32 [ [[B_NEXT_PEEL]], [[NEXT_PEEL_NEWPH]] ], [ [[B_NEXT:%.]], [[LOOP]] ]
	; CHECK-NEXT: [[CALL593:%.*]] = tail call zeroext i8 @patatino()			; CHECK-NEXT: [[CALL593:%.*]] = tail call zeroext i8 @patatino()
	; CHECK-NEXT: [[B_NEXT:%.*]] = add nuw nsw i32 [[B]], 1			; CHECK-NEXT: [[B_NEXT]] = add nuw nsw i32 [[B]], 1
	; CHECK-NEXT: [[CALL593_1:%.*]] = tail call zeroext i8 @patatino()			; CHECK-NEXT: [[COND:%.*]] = icmp ne i32 [[B]], 30
	; CHECK-NEXT: [[B_NEXT_1:%.*]] = add nuw nsw i32 [[B_NEXT]], 1			; CHECK-NEXT: br i1 [[COND]], label [[LOOP]], label [[LOOPEXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP3:![0-9]+]]
	; CHECK-NEXT: [[COND_1:%.*]] = icmp ne i32 [[B_NEXT]], 30
	; CHECK-NEXT: br i1 [[COND_1]], label [[LOOP_2]], label [[LOOPEXIT_LOOPEXIT:%.*]], !llvm.loop !0
	; CHECK: loop.2:
	; CHECK-NEXT: [[CALL593_2:%.*]] = tail call zeroext i8 @patatino()
	; CHECK-NEXT: [[B_NEXT_2:%.*]] = add nuw nsw i32 [[B_NEXT_1]], 1
	; CHECK-NEXT: [[CALL593_3:%.*]] = tail call zeroext i8 @patatino()
	; CHECK-NEXT: [[B_NEXT_3]] = add nuw nsw i32 [[B_NEXT_2]], 1
	; CHECK-NEXT: br label [[LOOP]], !llvm.loop !2
	;			;
	entry:			entry:
	br label %next			br label %next

	loopexit:			loopexit:
	ret void			ret void

	next:			next:
	Show All 10 Lines

llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll

	; NOTE: Assertions have been autogenerated by utils/update_test_checks.py			; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
	; RUN: opt -loop-unroll -unroll-peel-count=2 -S %s \| FileCheck --check-prefix=PEEL2 %s			; RUN: opt -loop-unroll -unroll-peel-count=2 -S %s \| FileCheck --check-prefix=PEEL2 %s
	; RUN: opt -loop-unroll -unroll-peel-count=8 -S %s \| FileCheck --check-prefix=PEEL8 %s			; RUN: opt -loop-unroll -unroll-peel-count=8 -S %s \| FileCheck --check-prefix=PEEL8 %s
	; RUN: opt -loop-unroll -unroll-peel-count=2 -unroll-count=2 -S %s \| FileCheck --check-prefix=PEEL2UNROLL2 %s

	; Test case for PR45939. Make sure unroll count is adjusted when loop is peeled and unrolled.			; Test case for PR45939. Make sure unroll count is adjusted when loop is peeled and unrolled.

	@a = global [8 x i32] zeroinitializer, align 16			@a = global [8 x i32] zeroinitializer, align 16

	define void @test1() {			define void @test1() {
	; PEEL2-LABEL: @test1(			; PEEL2-LABEL: @test1(
	; PEEL2-NEXT: entry:			; PEEL2-NEXT: entry:
	Show All 18 Lines
	; PEEL2-NEXT: br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_EXIT]]			; PEEL2-NEXT: br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_EXIT]]
	; PEEL2: for.body.peel.next1:			; PEEL2: for.body.peel.next1:
	; PEEL2-NEXT: br label [[FOR_BODY_PEEL_NEXT6:%.*]]			; PEEL2-NEXT: br label [[FOR_BODY_PEEL_NEXT6:%.*]]
	; PEEL2: for.body.peel.next6:			; PEEL2: for.body.peel.next6:
	; PEEL2-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]			; PEEL2-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
	; PEEL2: entry.peel.newph:			; PEEL2: entry.peel.newph:
	; PEEL2-NEXT: br label [[FOR_BODY:%.*]]			; PEEL2-NEXT: br label [[FOR_BODY:%.*]]
	; PEEL2: for.body:			; PEEL2: for.body:
	; PEEL2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL4]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT_7:%.]], [[FOR_BODY_6:%.*]] ]			; PEEL2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL4]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; PEEL2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV]]			; PEEL2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV]]
	; PEEL2-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; PEEL2-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; PEEL2-NEXT: store i32 [[TMP2]], i32* [[ARRAYIDX]], align 4			; PEEL2-NEXT: store i32 [[TMP2]], i32* [[ARRAYIDX]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; PEEL2-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; PEEL2-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT]]			; PEEL2-NEXT: [[EXITCOND:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT]], 8
	; PEEL2-NEXT: [[TMP3:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32			; PEEL2-NEXT: br i1 [[EXITCOND]], label [[FOR_BODY]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	; PEEL2-NEXT: store i32 [[TMP3]], i32* [[ARRAYIDX_1]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
	; PEEL2-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_1]]
	; PEEL2-NEXT: [[TMP4:%.*]] = trunc i64 [[INDVARS_IV_NEXT_1]] to i32
	; PEEL2-NEXT: store i32 [[TMP4]], i32* [[ARRAYIDX_2]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
	; PEEL2-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_2]]
	; PEEL2-NEXT: [[TMP5:%.*]] = trunc i64 [[INDVARS_IV_NEXT_2]] to i32
	; PEEL2-NEXT: store i32 [[TMP5]], i32* [[ARRAYIDX_3]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
	; PEEL2-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_3]]
	; PEEL2-NEXT: [[TMP6:%.*]] = trunc i64 [[INDVARS_IV_NEXT_3]] to i32
	; PEEL2-NEXT: store i32 [[TMP6]], i32* [[ARRAYIDX_4]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
	; PEEL2-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_4]]
	; PEEL2-NEXT: [[TMP7:%.*]] = trunc i64 [[INDVARS_IV_NEXT_4]] to i32
	; PEEL2-NEXT: store i32 [[TMP7]], i32* [[ARRAYIDX_5]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
	; PEEL2-NEXT: [[EXITCOND_5:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT_5]], 8
	; PEEL2-NEXT: br i1 [[EXITCOND_5]], label [[FOR_BODY_6]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	; PEEL2: for.exit.loopexit:			; PEEL2: for.exit.loopexit:
	; PEEL2-NEXT: br label [[FOR_EXIT]]			; PEEL2-NEXT: br label [[FOR_EXIT]]
	; PEEL2: for.exit:			; PEEL2: for.exit:
	; PEEL2-NEXT: ret void			; PEEL2-NEXT: ret void
	; PEEL2: for.body.6:
	; PEEL2-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_5]]
	; PEEL2-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV_NEXT_5]] to i32
	; PEEL2-NEXT: store i32 [[TMP8]], i32* [[ARRAYIDX_6]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
	; PEEL2-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_6]]
	; PEEL2-NEXT: [[TMP9:%.*]] = trunc i64 [[INDVARS_IV_NEXT_6]] to i32
	; PEEL2-NEXT: store i32 [[TMP9]], i32* [[ARRAYIDX_7]], align 4
	; PEEL2-NEXT: [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV_NEXT_6]], 1
	; PEEL2-NEXT: br label [[FOR_BODY]], !llvm.loop [[LOOP2:![0-9]+]]
	;			;
	; PEEL8-LABEL: @test1(			; PEEL8-LABEL: @test1(
	; PEEL8-NEXT: entry:			; PEEL8-NEXT: entry:
	; PEEL8-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]			; PEEL8-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
	; PEEL8: for.body.peel.begin:			; PEEL8: for.body.peel.begin:
	; PEEL8-NEXT: br label [[FOR_BODY_PEEL:%.*]]			; PEEL8-NEXT: br label [[FOR_BODY_PEEL:%.*]]
	; PEEL8: for.body.peel:			; PEEL8: for.body.peel:
	; PEEL8-NEXT: [[ARRAYIDX_PEEL:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 0			; PEEL8-NEXT: [[ARRAYIDX_PEEL:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 0
	▲ Show 20 Lines • Show All 67 Lines • ▼ Show 20 Lines
	; PEEL8-NEXT: br i1 [[EXITCOND_PEEL35]], label [[FOR_BODY_PEEL_NEXT31:%.*]], label [[FOR_EXIT]]			; PEEL8-NEXT: br i1 [[EXITCOND_PEEL35]], label [[FOR_BODY_PEEL_NEXT31:%.*]], label [[FOR_EXIT]]
	; PEEL8: for.body.peel.next31:			; PEEL8: for.body.peel.next31:
	; PEEL8-NEXT: br label [[FOR_BODY_PEEL_NEXT36:%.*]]			; PEEL8-NEXT: br label [[FOR_BODY_PEEL_NEXT36:%.*]]
	; PEEL8: for.body.peel.next36:			; PEEL8: for.body.peel.next36:
	; PEEL8-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]			; PEEL8-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
	; PEEL8: entry.peel.newph:			; PEEL8: entry.peel.newph:
	; PEEL8-NEXT: br label [[FOR_BODY:%.*]]			; PEEL8-NEXT: br label [[FOR_BODY:%.*]]
	; PEEL8: for.body:			; PEEL8: for.body:
	; PEEL8-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL34]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT_7:%.]], [[FOR_BODY_7:%.*]] ]			; PEEL8-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL34]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY]] ]
	; PEEL8-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV]]			; PEEL8-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV]]
	; PEEL8-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV]] to i32			; PEEL8-NEXT: [[TMP8:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; PEEL8-NEXT: store i32 [[TMP8]], i32* [[ARRAYIDX]], align 4			; PEEL8-NEXT: store i32 [[TMP8]], i32* [[ARRAYIDX]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; PEEL8-NEXT: [[INDVARS_IV_NEXT]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_1:%.]], label [[FOR_EXIT_LOOPEXIT:%.]], !llvm.loop [[LOOP0:![0-9]+]]			; PEEL8-NEXT: br i1 true, label [[FOR_BODY]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	; PEEL8: for.exit.loopexit:			; PEEL8: for.exit.loopexit:
	; PEEL8-NEXT: br label [[FOR_EXIT]]			; PEEL8-NEXT: br label [[FOR_EXIT]]
	; PEEL8: for.exit:			; PEEL8: for.exit:
	; PEEL8-NEXT: ret void			; PEEL8-NEXT: ret void
	; PEEL8: for.body.1:
	; PEEL8-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT]]
	; PEEL8-NEXT: [[TMP9:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; PEEL8-NEXT: store i32 [[TMP9]], i32* [[ARRAYIDX_1]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_2:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.2:
	; PEEL8-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_1]]
	; PEEL8-NEXT: [[TMP10:%.*]] = trunc i64 [[INDVARS_IV_NEXT_1]] to i32
	; PEEL8-NEXT: store i32 [[TMP10]], i32* [[ARRAYIDX_2]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_3:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.3:
	; PEEL8-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_2]]
	; PEEL8-NEXT: [[TMP11:%.*]] = trunc i64 [[INDVARS_IV_NEXT_2]] to i32
	; PEEL8-NEXT: store i32 [[TMP11]], i32* [[ARRAYIDX_3]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_4:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.4:
	; PEEL8-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_3]]
	; PEEL8-NEXT: [[TMP12:%.*]] = trunc i64 [[INDVARS_IV_NEXT_3]] to i32
	; PEEL8-NEXT: store i32 [[TMP12]], i32* [[ARRAYIDX_4]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_5:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.5:
	; PEEL8-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_4]]
	; PEEL8-NEXT: [[TMP13:%.*]] = trunc i64 [[INDVARS_IV_NEXT_4]] to i32
	; PEEL8-NEXT: store i32 [[TMP13]], i32* [[ARRAYIDX_5]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_6:%.*]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.6:
	; PEEL8-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_5]]
	; PEEL8-NEXT: [[TMP14:%.*]] = trunc i64 [[INDVARS_IV_NEXT_5]] to i32
	; PEEL8-NEXT: store i32 [[TMP14]], i32* [[ARRAYIDX_6]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY_7]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP0]]
	; PEEL8: for.body.7:
	; PEEL8-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_6]]
	; PEEL8-NEXT: [[TMP15:%.*]] = trunc i64 [[INDVARS_IV_NEXT_6]] to i32
	; PEEL8-NEXT: store i32 [[TMP15]], i32* [[ARRAYIDX_7]], align 4
	; PEEL8-NEXT: [[INDVARS_IV_NEXT_7]] = add nuw nsw i64 [[INDVARS_IV_NEXT_6]], 1
	; PEEL8-NEXT: br i1 true, label [[FOR_BODY]], label [[FOR_EXIT_LOOPEXIT]], !llvm.loop [[LOOP2:![0-9]+]]
	;
	; PEEL2UNROLL2-LABEL: @test1(
	; PEEL2UNROLL2-NEXT: entry:
	; PEEL2UNROLL2-NEXT: br label [[FOR_BODY_PEEL_BEGIN:%.*]]
	; PEEL2UNROLL2: for.body.peel.begin:
	; PEEL2UNROLL2-NEXT: br label [[FOR_BODY_PEEL:%.*]]
	; PEEL2UNROLL2: for.body.peel:
	; PEEL2UNROLL2-NEXT: [[ARRAYIDX_PEEL:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 0
	; PEEL2UNROLL2-NEXT: [[TMP0:%.*]] = trunc i64 0 to i32
	; PEEL2UNROLL2-NEXT: store i32 [[TMP0]], i32* [[ARRAYIDX_PEEL]], align 4
	; PEEL2UNROLL2-NEXT: [[INDVARS_IV_NEXT_PEEL:%.*]] = add nuw nsw i64 0, 1
	; PEEL2UNROLL2-NEXT: [[EXITCOND_PEEL:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT_PEEL]], 8
	; PEEL2UNROLL2-NEXT: br i1 [[EXITCOND_PEEL]], label [[FOR_BODY_PEEL_NEXT:%.]], label [[FOR_EXIT:%.]]
	; PEEL2UNROLL2: for.body.peel.next:
	; PEEL2UNROLL2-NEXT: br label [[FOR_BODY_PEEL2:%.*]]
	; PEEL2UNROLL2: for.body.peel2:
	; PEEL2UNROLL2-NEXT: [[ARRAYIDX_PEEL3:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT_PEEL]]
	; PEEL2UNROLL2-NEXT: [[TMP1:%.*]] = trunc i64 [[INDVARS_IV_NEXT_PEEL]] to i32
	; PEEL2UNROLL2-NEXT: store i32 [[TMP1]], i32* [[ARRAYIDX_PEEL3]], align 4
	; PEEL2UNROLL2-NEXT: [[INDVARS_IV_NEXT_PEEL4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_PEEL]], 1
	; PEEL2UNROLL2-NEXT: [[EXITCOND_PEEL5:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT_PEEL4]], 8
	; PEEL2UNROLL2-NEXT: br i1 [[EXITCOND_PEEL5]], label [[FOR_BODY_PEEL_NEXT1:%.*]], label [[FOR_EXIT]]
	; PEEL2UNROLL2: for.body.peel.next1:
	; PEEL2UNROLL2-NEXT: br label [[FOR_BODY_PEEL_NEXT6:%.*]]
	; PEEL2UNROLL2: for.body.peel.next6:
	; PEEL2UNROLL2-NEXT: br label [[ENTRY_PEEL_NEWPH:%.*]]
	; PEEL2UNROLL2: entry.peel.newph:
	; PEEL2UNROLL2-NEXT: br label [[FOR_BODY:%.*]]
	; PEEL2UNROLL2: for.body:
	; PEEL2UNROLL2-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT_PEEL4]], [[ENTRY_PEEL_NEWPH]] ], [ [[INDVARS_IV_NEXT_1:%.]], [[FOR_BODY]] ]
	; PEEL2UNROLL2-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV]]
	; PEEL2UNROLL2-NEXT: [[TMP2:%.*]] = trunc i64 [[INDVARS_IV]] to i32
	; PEEL2UNROLL2-NEXT: store i32 [[TMP2]], i32* [[ARRAYIDX]], align 4
	; PEEL2UNROLL2-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
	; PEEL2UNROLL2-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds [8 x i32], [8 x i32] @a, i64 0, i64 [[INDVARS_IV_NEXT]]
	; PEEL2UNROLL2-NEXT: [[TMP3:%.*]] = trunc i64 [[INDVARS_IV_NEXT]] to i32
	; PEEL2UNROLL2-NEXT: store i32 [[TMP3]], i32* [[ARRAYIDX_1]], align 4
	; PEEL2UNROLL2-NEXT: [[INDVARS_IV_NEXT_1]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
	; PEEL2UNROLL2-NEXT: [[EXITCOND_1:%.*]] = icmp ne i64 [[INDVARS_IV_NEXT_1]], 8
	; PEEL2UNROLL2-NEXT: br i1 [[EXITCOND_1]], label [[FOR_BODY]], label [[FOR_EXIT_LOOPEXIT:%.*]], !llvm.loop [[LOOP0:![0-9]+]]
	; PEEL2UNROLL2: for.exit.loopexit:
	; PEEL2UNROLL2-NEXT: br label [[FOR_EXIT]]
	; PEEL2UNROLL2: for.exit:
	; PEEL2UNROLL2-NEXT: ret void
	;			;
	entry:			entry:
	br label %for.body			br label %for.body

	for.body: ; preds = %entry, %for.body			for.body: ; preds = %entry, %for.body
	%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]			%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
	%arrayidx = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 %indvars.iv			%arrayidx = getelementptr inbounds [8 x i32], [8 x i32]* @a, i64 0, i64 %indvars.iv
	%0 = trunc i64 %indvars.iv to i32			%0 = trunc i64 %indvars.iv to i32
	store i32 %0, i32* %arrayidx, align 4			store i32 %0, i32* %arrayidx, align 4
	%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1			%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
	%exitcond = icmp ne i64 %indvars.iv.next, 8			%exitcond = icmp ne i64 %indvars.iv.next, 8
	br i1 %exitcond, label %for.body, label %for.exit			br i1 %exitcond, label %for.body, label %for.exit

	for.exit: ; preds = %for.body			for.exit: ; preds = %for.body
	ret void			ret void
	}			}

llvm/test/Transforms/LoopUnroll/wrong_assert_in_peeling.ll

	Show All 33 Lines
	; CHECK-NEXT: br label [[BB1_PEEL_NEWPH:%.*]]			; CHECK-NEXT: br label [[BB1_PEEL_NEWPH:%.*]]
	; CHECK: bb1.peel.newph:			; CHECK: bb1.peel.newph:
	; CHECK-NEXT: br label [[BB2:%.*]]			; CHECK-NEXT: br label [[BB2:%.*]]
	; CHECK: bb2:			; CHECK: bb2:
	; CHECK-NEXT: [[TMP3:%.]] = phi i32 [ [[TMP4_PEEL]], [[BB1_PEEL_NEWPH]] ], [ [[TMP4:%.]], [[BB12:%.*]] ]			; CHECK-NEXT: [[TMP3:%.]] = phi i32 [ [[TMP4_PEEL]], [[BB1_PEEL_NEWPH]] ], [ [[TMP4:%.]], [[BB12:%.*]] ]
	; CHECK-NEXT: [[TMP4]] = add nsw i32 [[TMP3]], [[TMP]]			; CHECK-NEXT: [[TMP4]] = add nsw i32 [[TMP3]], [[TMP]]
	; CHECK-NEXT: br label [[BB5:%.*]]			; CHECK-NEXT: br label [[BB5:%.*]]
	; CHECK: bb5:			; CHECK: bb5:
	; CHECK-NEXT: br i1 false, label [[BB7:%.]], label [[BB15_LOOPEXIT:%.]]			; CHECK-NEXT: br i1 undef, label [[BB7:%.]], label [[BB15_LOOPEXIT:%.]]
	; CHECK: bb7:			; CHECK: bb7:
	; CHECK-NEXT: br i1 undef, label [[BB10:%.*]], label [[BB10]]			; CHECK-NEXT: br i1 undef, label [[BB10:%.*]], label [[BB10]]
	; CHECK: bb10:			; CHECK: bb10:
	; CHECK-NEXT: br i1 false, label [[BB12]], label [[BB17_LOOPEXIT:%.*]]			; CHECK-NEXT: br i1 undef, label [[BB12]], label [[BB17_LOOPEXIT:%.*]]
	; CHECK: bb12:			; CHECK: bb12:
	; CHECK-NEXT: br i1 false, label [[BB13_LOOPEXIT:%.*]], label [[BB2]], !llvm.loop !0			; CHECK-NEXT: br i1 false, label [[BB13_LOOPEXIT:%.*]], label [[BB2]], !llvm.loop [[LOOP0:![0-9]+]]
	; CHECK: bb13.loopexit:			; CHECK: bb13.loopexit:
	; CHECK-NEXT: br label [[BB13]]			; CHECK-NEXT: br label [[BB13]]
	; CHECK: bb13:			; CHECK: bb13:
	; CHECK-NEXT: [[TMP14]] = add nsw i32 [[TMP]], -1			; CHECK-NEXT: [[TMP14]] = add nsw i32 [[TMP]], -1
	; CHECK-NEXT: br label [[BB1]]			; CHECK-NEXT: br label [[BB1]]
	; CHECK: bb15.loopexit:			; CHECK: bb15.loopexit:
	; CHECK-NEXT: br label [[BB15:%.*]]			; CHECK-NEXT: br label [[BB15:%.*]]
	; CHECK: bb15.loopexit2:			; CHECK: bb15.loopexit2:
	▲ Show 20 Lines • Show All 54 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnroll] Separate peeling from unrollingClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 350040

llvm/include/llvm/Transforms/Utils/UnrollLoop.h

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp

llvm/lib/Transforms/Utils/LoopUnroll.cpp

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/test/Transforms/LoopUnroll/peel-loop-and-unroll.ll

llvm/test/Transforms/LoopUnroll/pr33437.ll

llvm/test/Transforms/LoopUnroll/pr45939-peel-count-and-complete-unroll.ll

llvm/test/Transforms/LoopUnroll/wrong_assert_in_peeling.ll

[LoopUnroll] Separate peeling from unrolling
ClosedPublic