This is an archive of the discontinued LLVM Phabricator instance.

Compute estimated trip counts for multiple exit loops
ClosedPublic

Authored by reames on Dec 8 2021, 10:30 AM.

Download Raw Diff

Details

Reviewers

anna
apilipenko
lebedev.ri
nikic
Whitney
hoy

Commits

rG2d31b02517c0: Compute estimated trip counts for multiple exit loops

Summary

This change allows us to estimate trip count from profile metadata for all multiple exit loops. We still do the estimate only from the latch, but that's fine as it causes us to over estimate the trip count at worst.

Reviewing the uses of the API, all but one are cases where we restrict a loop transformation (unroll, and vectorize respectively) when we know the trip count is short enough. So, as a result, the change makes these passes strictly less aggressive. The test change illustrates a case where we'd previously have runtime unrolled a loop which ran fewer iterations than the unroll factor. This is definitely unprofitable.

The one case where an upper bound on estimate trip count could drive a more aggressive transform is peeling, and I duplicated the logic being removed from the generic estimation there to keep it the same. The resulting heuristic makes no sense and should probably be immediately removed, but we can do that in a separate change.

This was noticed when analyzing regressions on D113939.

I plan to come back and incorporate estimated trip counts from other exits, but that's a minor improvement which can follow separately.

Diff Detail

Event Timeline

reames created this revision.Dec 8 2021, 10:30 AM

Herald added subscribers: zzheng, bollu, hiraditya, mcrosier. · View Herald TranscriptDec 8 2021, 10:30 AM

reames requested review of this revision.Dec 8 2021, 10:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptDec 8 2021, 10:30 AM

reames mentioned this in D113939: [runtime-unroll] Remove restriction about unrolling multiple exit loops.Dec 8 2021, 10:32 AM

reames added a child revision: D113939: [runtime-unroll] Remove restriction about unrolling multiple exit loops.

Harbormaster completed remote builds in B138203: Diff 392826.Dec 8 2021, 11:13 AM

Nice find! This is obviously a good thing to do for unrolling. Any ideas why there's no vectorization test cases to update since we do support multi-exit vectorization?

In D115362#3180798, @anna wrote:

Nice find! This is obviously a good thing to do for unrolling. Any ideas why there's no vectorization test cases to update since we do support multi-exit vectorization?

Well, purely because I didn't write one. The unrolling test change didn't exist either before I wrote the one showing a diff this morning. Note that even that one requires a non-default flag value to show up.

I don't think the change in vectorization is likely to be interesting, and crafting a test will be tedious. Do you want me to bother?

LGTM.

In D115362#3180931, @reames wrote:

In D115362#3180798, @anna wrote:

Nice find! This is obviously a good thing to do for unrolling. Any ideas why there's no vectorization test cases to update since we do support multi-exit vectorization?

Well, purely because I didn't write one. The unrolling test change didn't exist either before I wrote the one showing a diff this morning. Note that even that one requires a non-default flag value to show up.

I don't think the change in vectorization is likely to be interesting, and crafting a test will be tedious. Do you want me to bother?

That's fine. Was just curious on how this would show up with vectorization. Given it is tedious and non-obvious, the patch as-is looks good.

This revision is now accepted and ready to land.Dec 9 2021, 6:35 AM

This revision was landed with ongoing or failed builds.Dec 9 2021, 9:54 AM

Closed by commit rG2d31b02517c0: Compute estimated trip counts for multiple exit loops (authored by reames). · Explain Why

This revision was automatically updated to reflect the committed changes.

reames added a commit: rG2d31b02517c0: Compute estimated trip counts for multiple exit loops.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

LoopPeel.cpp

27 lines

LoopUtils.cpp

22 lines

test/

Transforms/

LoopUnroll/

runtime-multiexit-heuristic.ll

135 lines

Diff 392826

llvm/lib/Transforms/Utils/LoopPeel.cpp

Show First 20 Lines • Show All 327 Lines • ▼ Show 20 Lines	for (auto *BB : L.blocks()) {
}		}

DesiredPeelCount = std::max(DesiredPeelCount, NewPeelCount);		DesiredPeelCount = std::max(DesiredPeelCount, NewPeelCount);
}		}

return DesiredPeelCount;		return DesiredPeelCount;
}		}

		/// This "heuristic" exactly matches implicit behavior which used to exist
		/// inside getLoopEstimatedTripCount. It was added here to keep an
		/// improvement inside that API from causing peeling to become more agressive.
		/// This should probably be removed.
		static bool violatesLegacyMultiExitLoopCheck(Loop *L) {
		BasicBlock *Latch = L->getLoopLatch();
		if (!Latch)
		return true;

		BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());
		if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))
		return true;

		assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|
		LatchBR->getSuccessor(1) == L->getHeader()) &&
		"At least one edge out of the latch must go to the header");

		SmallVector<BasicBlock *, 4> ExitBlocks;
		L->getUniqueNonLatchExitBlocks(ExitBlocks);
		return any_of(ExitBlocks, [](const BasicBlock *EB) {
		return !EB->getTerminatingDeoptimizeCall();
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - return !EB->getTerminatingDeoptimizeCall(); - }); + return !EB->getTerminatingDeoptimizeCall(); + }); Lint: Pre-merge checks: clang-format: please reformat the code ``` - return !EB->getTerminatingDeoptimizeCall()…
		});
		}


		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Lint: Pre-merge checks: clang-format: please reformat the code ``` - ```
// Return the number of iterations we want to peel off.		// Return the number of iterations we want to peel off.
void llvm::computePeelCount(Loop *L, unsigned LoopSize,		void llvm::computePeelCount(Loop *L, unsigned LoopSize,
TargetTransformInfo::PeelingPreferences &PP,		TargetTransformInfo::PeelingPreferences &PP,
unsigned &TripCount, DominatorTree &DT,		unsigned &TripCount, DominatorTree &DT,
ScalarEvolution &SE, unsigned Threshold) {		ScalarEvolution &SE, unsigned Threshold) {
assert(LoopSize > 0 && "Zero loop size is not allowed!");		assert(LoopSize > 0 && "Zero loop size is not allowed!");
// Save the PP.PeelCount value set by the target in		// Save the PP.PeelCount value set by the target in
// TTI.getPeelingPreferences or by the flag -unroll-peel-count.		// TTI.getPeelingPreferences or by the flag -unroll-peel-count.
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	void llvm::computePeelCount(Loop *L, unsigned LoopSize,
if (!PP.PeelProfiledIterations)		if (!PP.PeelProfiledIterations)
return;		return;
// If we don't know the trip count, but have reason to believe the average		// If we don't know the trip count, but have reason to believe the average
// trip count is low, peeling should be beneficial, since we will usually		// trip count is low, peeling should be beneficial, since we will usually
// hit the peeled section.		// hit the peeled section.
// We only do this in the presence of profile information, since otherwise		// We only do this in the presence of profile information, since otherwise
// our estimates of the trip count are not reliable enough.		// our estimates of the trip count are not reliable enough.
if (L->getHeader()->getParent()->hasProfileData()) {		if (L->getHeader()->getParent()->hasProfileData()) {
		if (violatesLegacyMultiExitLoopCheck(L))
		return;
Optional<unsigned> PeelCount = getLoopEstimatedTripCount(L);		Optional<unsigned> PeelCount = getLoopEstimatedTripCount(L);
if (!PeelCount)		if (!PeelCount)
return;		return;

LLVM_DEBUG(dbgs() << "Profile-based estimated trip count is " << *PeelCount		LLVM_DEBUG(dbgs() << "Profile-based estimated trip count is " << *PeelCount
<< "\n");		<< "\n");

if (*PeelCount) {		if (*PeelCount) {
▲ Show 20 Lines • Show All 452 Lines • Show Last 20 Lines

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show First 20 Lines • Show All 766 Lines • ▼ Show 20 Lines	void llvm::breakLoopBackedge(Loop *L, DominatorTree &DT, ScalarEvolution &SE,
// caused a block to be removed from the parent loop (see loop_nest_lcssa		// caused a block to be removed from the parent loop (see loop_nest_lcssa
// test case in zero-btc.ll for an example), thus changing the parent's		// test case in zero-btc.ll for an example), thus changing the parent's
// exit blocks. If that happened, we need to rebuild LCSSA on the outermost		// exit blocks. If that happened, we need to rebuild LCSSA on the outermost
// loop which might have a had a block removed.		// loop which might have a had a block removed.
if (OutermostLoop != L)		if (OutermostLoop != L)
formLCSSARecursively(*OutermostLoop, DT, &LI, &SE);		formLCSSARecursively(*OutermostLoop, DT, &LI, &SE);
}		}


		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code - Lint: Pre-merge checks: clang-format: please reformat the code ``` - ```
/// Checks if \p L has single exit through latch block except possibly		/// Checks if \p L has an exiting latch branch. There may also be other
/// "deoptimizing" exits. Returns branch instruction terminating the loop		/// exiting blocks. Returns branch instruction terminating the loop
/// latch if above check is successful, nullptr otherwise.		/// latch if above check is successful, nullptr otherwise.
static BranchInst getExpectedExitLoopLatchBranch(Loop L) {		static BranchInst getExpectedExitLoopLatchBranch(Loop L) {
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
if (!Latch)		if (!Latch)
return nullptr;		return nullptr;

BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());		BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());
if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))		if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))
return nullptr;		return nullptr;

assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|		assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|
LatchBR->getSuccessor(1) == L->getHeader()) &&		LatchBR->getSuccessor(1) == L->getHeader()) &&
"At least one edge out of the latch must go to the header");		"At least one edge out of the latch must go to the header");

SmallVector<BasicBlock *, 4> ExitBlocks;
L->getUniqueNonLatchExitBlocks(ExitBlocks);
if (any_of(ExitBlocks, [](const BasicBlock *EB) {
return !EB->getTerminatingDeoptimizeCall();
}))
return nullptr;

return LatchBR;		return LatchBR;
}		}

Optional<unsigned>		Optional<unsigned>
llvm::getLoopEstimatedTripCount(Loop *L,		llvm::getLoopEstimatedTripCount(Loop *L,
unsigned *EstimatedLoopInvocationWeight) {		unsigned *EstimatedLoopInvocationWeight) {
// Support loops with an exiting latch and other existing exists only		// Currently we take the estimate exit count only from the loop latch,
// deoptimize.		// ignoring other exiting blocks. This can overestimate the trip count
		// if we exit through another exit, but can never underestimate it.
		// TODO: incorporate information from other exits
BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);		BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)		if (!LatchBranch)
return None;		return None;

// To estimate the number of times the loop body was executed, we want to		// To estimate the number of times the loop body was executed, we want to
// know the number of times the backedge was taken, vs. the number of times		// know the number of times the backedge was taken, vs. the number of times
// we exited the loop.		// we exited the loop.
uint64_t BackedgeTakenWeight, LatchExitWeight;		uint64_t BackedgeTakenWeight, LatchExitWeight;
Show All 14 Lines	llvm::getLoopEstimatedTripCount(Loop *L,
uint64_t BackedgeTakenCount =		uint64_t BackedgeTakenCount =
llvm::divideNearest(BackedgeTakenWeight, LatchExitWeight);		llvm::divideNearest(BackedgeTakenWeight, LatchExitWeight);
// Estimated trip count is one plus estimated backedge taken count.		// Estimated trip count is one plus estimated backedge taken count.
return BackedgeTakenCount + 1;		return BackedgeTakenCount + 1;
}		}

bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,		bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
unsigned EstimatedloopInvocationWeight) {		unsigned EstimatedloopInvocationWeight) {
// Support loops with an exiting latch and other existing exists only		// At the moment, we currently support changing the estimate trip count of
// deoptimize.		// the latch branch only. We could extend this API to manipulate estimated
		// trip counts for any exit.
BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);		BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
if (!LatchBranch)		if (!LatchBranch)
return false;		return false;

// Calculate taken and exit weights.		// Calculate taken and exit weights.
unsigned LatchExitWeight = 0;		unsigned LatchExitWeight = 0;
unsigned BackedgeTakenWeight = 0;		unsigned BackedgeTakenWeight = 0;

▲ Show 20 Lines • Show All 939 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/runtime-multiexit-heuristic.ll

	Show First 20 Lines • Show All 602 Lines • ▼ Show 20 Lines
	; NOUNROLL: latchexit:			; NOUNROLL: latchexit:
	; NOUNROLL-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[ADD]], [[LATCH]] ]			; NOUNROLL-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[ADD]], [[LATCH]] ]
	; NOUNROLL-NEXT: ret i32 [[SUM_0_LCSSA]]			; NOUNROLL-NEXT: ret i32 [[SUM_0_LCSSA]]
	; NOUNROLL: otherexit:			; NOUNROLL: otherexit:
	; NOUNROLL-NEXT: ret i32 57			; NOUNROLL-NEXT: ret i32 57
	;			;
	; ENABLED-LABEL: @test3(			; ENABLED-LABEL: @test3(
	; ENABLED-NEXT: entry:			; ENABLED-NEXT: entry:
	; ENABLED-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], -1
	; ENABLED-NEXT: [[XTRAITER:%.*]] = and i64 [[N]], 7
	; ENABLED-NEXT: [[TMP1:%.*]] = icmp ult i64 [[TMP0]], 7
	; ENABLED-NEXT: br i1 [[TMP1]], label [[LATCHEXIT_UNR_LCSSA:%.]], label [[ENTRY_NEW:%.]]
	; ENABLED: entry.new:
	; ENABLED-NEXT: [[UNROLL_ITER:%.*]] = sub i64 [[N]], [[XTRAITER]]
	; ENABLED-NEXT: br label [[HEADER:%.*]]			; ENABLED-NEXT: br label [[HEADER:%.*]]
	; ENABLED: header:			; ENABLED: header:
	; ENABLED-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY_NEW]] ], [ [[INDVARS_IV_NEXT_7:%.]], [[LATCH_7:%.*]] ]			; ENABLED-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[LATCH:%.]] ], [ 0, [[ENTRY:%.]] ]
	; ENABLED-NEXT: [[SUM_02:%.]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[ADD_7:%.]], [[LATCH_7]] ]			; ENABLED-NEXT: [[SUM_02:%.]] = phi i32 [ [[ADD:%.]], [[LATCH]] ], [ 0, [[ENTRY]] ]
	; ENABLED-NEXT: [[NITER:%.]] = phi i64 [ 0, [[ENTRY_NEW]] ], [ [[NITER_NEXT_7:%.]], [[LATCH_7]] ]
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK:%.*]]			; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK:%.*]]
	; ENABLED: for.exiting_block:			; ENABLED: for.exiting_block:
	; ENABLED-NEXT: [[CMP:%.*]] = icmp eq i64 [[N]], 42			; ENABLED-NEXT: [[CMP:%.]] = icmp eq i64 [[N:%.]], 42
	; ENABLED-NEXT: br i1 [[CMP]], label [[OTHEREXIT_LOOPEXIT:%.]], label [[LATCH:%.]]			; ENABLED-NEXT: br i1 [[CMP]], label [[OTHEREXIT:%.*]], label [[LATCH]]
	; ENABLED: latch:			; ENABLED: latch:
	; ENABLED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]			; ENABLED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]
	; ENABLED-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX]], align 4			; ENABLED-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
	; ENABLED-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP2]], [[SUM_02]]			; ENABLED-NEXT: [[ADD]] = add nsw i32 [[TMP0]], [[SUM_02]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1			; ENABLED-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
	; ENABLED-NEXT: [[NITER_NEXT:%.*]] = add nuw nsw i64 [[NITER]], 1			; ENABLED-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_1:%.*]]			; ENABLED-NEXT: br i1 [[EXITCOND]], label [[LATCHEXIT:%.*]], label [[HEADER]], !prof [[PROF4:![0-9]+]]
	; ENABLED: for.exiting_block.1:
	; ENABLED-NEXT: [[CMP_1:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_1]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_1:%.*]]
	; ENABLED: latch.1:
	; ENABLED-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT]]
	; ENABLED-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4
	; ENABLED-NEXT: [[ADD_1:%.*]] = add nsw i32 [[TMP3]], [[ADD]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
	; ENABLED-NEXT: [[NITER_NEXT_1:%.*]] = add nuw nsw i64 [[NITER_NEXT]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_2:%.*]]
	; ENABLED: for.exiting_block.2:
	; ENABLED-NEXT: [[CMP_2:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_2]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_2:%.*]]
	; ENABLED: latch.2:
	; ENABLED-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_1]]
	; ENABLED-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX_2]], align 4
	; ENABLED-NEXT: [[ADD_2:%.*]] = add nsw i32 [[TMP4]], [[ADD_1]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
	; ENABLED-NEXT: [[NITER_NEXT_2:%.*]] = add nuw nsw i64 [[NITER_NEXT_1]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_3:%.*]]
	; ENABLED: for.exiting_block.3:
	; ENABLED-NEXT: [[CMP_3:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_3]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_3:%.*]]
	; ENABLED: latch.3:
	; ENABLED-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_2]]
	; ENABLED-NEXT: [[TMP5:%.]] = load i32, i32 [[ARRAYIDX_3]], align 4
	; ENABLED-NEXT: [[ADD_3:%.*]] = add nsw i32 [[TMP5]], [[ADD_2]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
	; ENABLED-NEXT: [[NITER_NEXT_3:%.*]] = add nuw nsw i64 [[NITER_NEXT_2]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_4:%.*]]
	; ENABLED: for.exiting_block.4:
	; ENABLED-NEXT: [[CMP_4:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_4]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_4:%.*]]
	; ENABLED: latch.4:
	; ENABLED-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_3]]
	; ENABLED-NEXT: [[TMP6:%.]] = load i32, i32 [[ARRAYIDX_4]], align 4
	; ENABLED-NEXT: [[ADD_4:%.*]] = add nsw i32 [[TMP6]], [[ADD_3]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
	; ENABLED-NEXT: [[NITER_NEXT_4:%.*]] = add nuw nsw i64 [[NITER_NEXT_3]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_5:%.*]]
	; ENABLED: for.exiting_block.5:
	; ENABLED-NEXT: [[CMP_5:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_5]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_5:%.*]]
	; ENABLED: latch.5:
	; ENABLED-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_4]]
	; ENABLED-NEXT: [[TMP7:%.]] = load i32, i32 [[ARRAYIDX_5]], align 4
	; ENABLED-NEXT: [[ADD_5:%.*]] = add nsw i32 [[TMP7]], [[ADD_4]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
	; ENABLED-NEXT: [[NITER_NEXT_5:%.*]] = add nuw nsw i64 [[NITER_NEXT_4]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_6:%.*]]
	; ENABLED: for.exiting_block.6:
	; ENABLED-NEXT: [[CMP_6:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_6]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_6:%.*]]
	; ENABLED: latch.6:
	; ENABLED-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_5]]
	; ENABLED-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX_6]], align 4
	; ENABLED-NEXT: [[ADD_6:%.*]] = add nsw i32 [[TMP8]], [[ADD_5]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
	; ENABLED-NEXT: [[NITER_NEXT_6:%.*]] = add nuw nsw i64 [[NITER_NEXT_5]], 1
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_7:%.*]]
	; ENABLED: for.exiting_block.7:
	; ENABLED-NEXT: [[CMP_7:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_7]], label [[OTHEREXIT_LOOPEXIT]], label [[LATCH_7]]
	; ENABLED: latch.7:
	; ENABLED-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_6]]
	; ENABLED-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX_7]], align 4
	; ENABLED-NEXT: [[ADD_7]] = add nsw i32 [[TMP9]], [[ADD_6]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_7]] = add i64 [[INDVARS_IV_NEXT_6]], 1
	; ENABLED-NEXT: [[NITER_NEXT_7]] = add i64 [[NITER_NEXT_6]], 1
	; ENABLED-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NEXT_7]], [[UNROLL_ITER]]
	; ENABLED-NEXT: br i1 [[NITER_NCMP_7]], label [[LATCHEXIT_UNR_LCSSA_LOOPEXIT:%.*]], label [[HEADER]], !prof [[PROF4:![0-9]+]]
	; ENABLED: latchexit.unr-lcssa.loopexit:
	; ENABLED-NEXT: [[SUM_0_LCSSA_PH_PH:%.*]] = phi i32 [ [[ADD_7]], [[LATCH_7]] ]
	; ENABLED-NEXT: [[INDVARS_IV_UNR_PH:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_7]], [[LATCH_7]] ]
	; ENABLED-NEXT: [[SUM_02_UNR_PH:%.*]] = phi i32 [ [[ADD_7]], [[LATCH_7]] ]
	; ENABLED-NEXT: br label [[LATCHEXIT_UNR_LCSSA]]
	; ENABLED: latchexit.unr-lcssa:
	; ENABLED-NEXT: [[SUM_0_LCSSA_PH:%.]] = phi i32 [ undef, [[ENTRY:%.]] ], [ [[SUM_0_LCSSA_PH_PH]], [[LATCHEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; ENABLED-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_UNR_PH]], [[LATCHEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; ENABLED-NEXT: [[SUM_02_UNR:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_02_UNR_PH]], [[LATCHEXIT_UNR_LCSSA_LOOPEXIT]] ]
	; ENABLED-NEXT: [[LCMP_MOD:%.*]] = icmp ne i64 [[XTRAITER]], 0
	; ENABLED-NEXT: br i1 [[LCMP_MOD]], label [[HEADER_EPIL_PREHEADER:%.]], label [[LATCHEXIT:%.]]
	; ENABLED: header.epil.preheader:
	; ENABLED-NEXT: br label [[HEADER_EPIL:%.*]]
	; ENABLED: header.epil:
	; ENABLED-NEXT: [[INDVARS_IV_EPIL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_EPIL:%.]], [[LATCH_EPIL:%.*]] ], [ [[INDVARS_IV_UNR]], [[HEADER_EPIL_PREHEADER]] ]
	; ENABLED-NEXT: [[SUM_02_EPIL:%.]] = phi i32 [ [[ADD_EPIL:%.]], [[LATCH_EPIL]] ], [ [[SUM_02_UNR]], [[HEADER_EPIL_PREHEADER]] ]
	; ENABLED-NEXT: [[EPIL_ITER:%.]] = phi i64 [ 0, [[HEADER_EPIL_PREHEADER]] ], [ [[EPIL_ITER_NEXT:%.]], [[LATCH_EPIL]] ]
	; ENABLED-NEXT: br label [[FOR_EXITING_BLOCK_EPIL:%.*]]
	; ENABLED: for.exiting_block.epil:
	; ENABLED-NEXT: [[CMP_EPIL:%.*]] = icmp eq i64 [[N]], 42
	; ENABLED-NEXT: br i1 [[CMP_EPIL]], label [[OTHEREXIT_LOOPEXIT2:%.*]], label [[LATCH_EPIL]]
	; ENABLED: latch.epil:
	; ENABLED-NEXT: [[ARRAYIDX_EPIL:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_EPIL]]
	; ENABLED-NEXT: [[TMP10:%.]] = load i32, i32 [[ARRAYIDX_EPIL]], align 4
	; ENABLED-NEXT: [[ADD_EPIL]] = add nsw i32 [[TMP10]], [[SUM_02_EPIL]]
	; ENABLED-NEXT: [[INDVARS_IV_NEXT_EPIL]] = add i64 [[INDVARS_IV_EPIL]], 1
	; ENABLED-NEXT: [[EXITCOND_EPIL:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_EPIL]], [[N]]
	; ENABLED-NEXT: [[EPIL_ITER_NEXT]] = add i64 [[EPIL_ITER]], 1
	; ENABLED-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp ne i64 [[EPIL_ITER_NEXT]], [[XTRAITER]]
	; ENABLED-NEXT: br i1 [[EPIL_ITER_CMP]], label [[HEADER_EPIL]], label [[LATCHEXIT_EPILOG_LCSSA:%.*]], !prof [[PROF5:![0-9]+]], !llvm.loop [[LOOP6:![0-9]+]]
	; ENABLED: latchexit.epilog-lcssa:
	; ENABLED-NEXT: [[SUM_0_LCSSA_PH1:%.*]] = phi i32 [ [[ADD_EPIL]], [[LATCH_EPIL]] ]
	; ENABLED-NEXT: br label [[LATCHEXIT]]
	; ENABLED: latchexit:			; ENABLED: latchexit:
	; ENABLED-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[SUM_0_LCSSA_PH]], [[LATCHEXIT_UNR_LCSSA]] ], [ [[SUM_0_LCSSA_PH1]], [[LATCHEXIT_EPILOG_LCSSA]] ]			; ENABLED-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[ADD]], [[LATCH]] ]
	; ENABLED-NEXT: ret i32 [[SUM_0_LCSSA]]			; ENABLED-NEXT: ret i32 [[SUM_0_LCSSA]]
	; ENABLED: otherexit.loopexit:
	; ENABLED-NEXT: br label [[OTHEREXIT:%.*]]
	; ENABLED: otherexit.loopexit2:
	; ENABLED-NEXT: br label [[OTHEREXIT]]
	; ENABLED: otherexit:			; ENABLED: otherexit:
	; ENABLED-NEXT: ret i32 57			; ENABLED-NEXT: ret i32 57
	;			;
	entry:			entry:
	br label %header			br label %header

	header:			header:
	%indvars.iv = phi i64 [ %indvars.iv.next, %latch ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %latch ], [ 0, %entry ]
	▲ Show 20 Lines • Show All 126 Lines • Show Last 20 Lines