Download Raw Diff

Details

Reviewers

Meinersbur
bmahjour
reames
fhahn

Commits

rG40391cef6164: [LoopUnrollRuntime] Add option to assume the non latch exit block to be
rG58d531fd6f04: [LoopUnrollRuntime] Add option to assume the non latch exit block to be

Summary

Add option -unroll-runtime-other-exit-predictable to assume the non latch exit block to be predictable.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	470 ms	x64 debian > LLVM.Transforms/LoopUnroll::runtime-loop-at-most-two-exits.ll
	450 ms	x64 windows > LLVM.Transforms/LoopUnroll::runtime-loop-at-most-two-exits.ll

Event Timeline

Whitney created this revision.Mar 1 2021, 4:41 PM

Herald added subscribers: zzheng, hiraditya. · View Herald TranscriptMar 1 2021, 4:41 PM

Whitney requested review of this revision.Mar 1 2021, 4:41 PM

Herald added a subscriber: llvm-commits. · View Herald TranscriptMar 1 2021, 4:41 PM

Harbormaster completed remote builds in B91475: Diff 327321.Mar 1 2021, 10:25 PM

bmahjour added inline comments.Mar 2 2021, 8:42 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
507	IIUC this is basically trying to avoid the check for the "deoptimize" call on line 514, right? In that case can we just add this check to that line and update the comments on line 512 with something like: ". When `UnrollRuntimeAtMostTwoExits` is specified we assume the other exit branch is predictable even if it has no deoptimize call"?
llvm/test/Transforms/LoopUnroll/runtime-loop-at-most-two-exits.ll
99	how come it is still unrolled?

Fixed LIT

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
507	When `UnrollRuntimeAtMostTwoExits` is specified, we also allows `OtherExits.size()` == 0.
llvm/test/Transforms/LoopUnroll/runtime-loop-at-most-two-exits.ll
99	I should remove `-unroll-count=2`

update comment.

bmahjour added inline comments.Mar 2 2021, 10:48 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
507	I would have expected that when OtherExits is empty, we are not in a multi-exit-loop scenario so the result of this function won't matter...but I guess it's better to be cautious and explicitly handle that case.

Harbormaster completed remote builds in B91603: Diff 327497.Mar 2 2021, 11:38 AM

Harbormaster completed remote builds in B91606: Diff 327499.Mar 2 2021, 11:50 AM

What is the motivation for the switch? Why wouldn't we want to unroll loops with 3 or more exits provided we can and the heuristic sais its profitable? Why is the cutoff at 3? Could we also pass the cutoff-point as an option?

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
499–502	`OtherExits.size() > 1` will also make the `getTerminatingDeoptimizeCall` heuristic fail, so this is not a semantics change. However, I find the structure confusing, especially if we want to add more conditions that enable the optimization. Did you consider the following structure? if (OtherExits.size() <= 1 && UnrollRuntimeAtMostTwoExits) return true; if (OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) return true; // More conditions can be added here return false;

Whitney updated this revision to Diff 327773.Mar 3 2021, 6:38 AM

Whitney marked 3 inline comments as done.

Whitney edited the summary of this revision. (Show Details)

bmahjour added inline comments.Mar 3 2021, 7:03 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	I think we can avoid this extra option by initializing the Cutoff to 0. Any number larger than 0 would mean a valid cutoff. (ie 0 means no cutoff enabled, 1 means no multi-exit, 2 means at most 2 exits, etc). It would also render `UnrollRuntimeMultiExit` obsolete, which can be replaced in this patch or a subsequent one.

Whitney updated this revision to Diff 327792.Mar 3 2021, 7:25 AM

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	Why `UnrollRuntimeMultiExit` is obsolete? Do you mean the user of `UnrollRuntimeMultiExit` can put a very big number?
499–502	What do you think of the current structure? I have added a cutoff option.

bmahjour added inline comments.Mar 3 2021, 7:38 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number of exiting blocks, the less likely it will be profitable to unroll.

Whitney edited the summary of this revision. (Show Details)Mar 3 2021, 7:42 AM

Whitney retitled this revision from [LoopUnrollRuntime] Add option to unroll loops with at most two exit/exiting blocks. to [LoopUnrollRuntime] Add option to unroll loops with at most a specified number of exit/exiting blocks..

Whitney marked 2 inline comments as done.

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	ok, I can put a review for that after this patch get landed.

Meinersbur added inline comments.Mar 3 2021, 8:24 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
490–492	I think @bmahjour's suggestion (and mine) was to completely disable UnrollRuntimeCutoffPoint if it is `0` if (UnrollRuntimeCutoffPoint == 0 \|\| (ExitingBlocks.size() <= UnrollRuntimeCutoffPoint && OtherExits.size() < UnrollRuntimeCutoffPoint)) return true; I don't really understand why it bounds both, ExitingBlock and OtherExists (instead of only one of them or the sum of such blocks). Could you explain? @bmahjour's other point is that it makes `UnrollRuntimeMultiExit` redundant: `-unroll-runtime-multi-exit=true` is equivalent to `-unroll-runtime-cutoff-point=0` and -unroll-runtime-multi-exit=false` is equivalent to `-unroll-runtime-cutoff-point=1` (I think in this case I think `canProfitablyUnrollMultiExitLoop` would not be called anyway). Do I understand correctly that you would remove the `-unroll-runtime-multi-exit` in a followup-patch?

bmahjour added inline comments.Mar 3 2021, 8:39 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number of exiting blocks, the less likely it will be profitable to unroll. ...just to clarify this, as Michael pointed out my original intention was to have 0 mean no cutoff (ie any number of exit/exiting blocks are allowed)....but the same thing can effectively be achieved with a large value specified for the option.

Meinersbur added inline comments.Mar 3 2021, 9:18 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
57	IMHO having 0 as no cutoff is more idiomatic than having an arbitrary-large cutoff (of 0 has otherwise no meaning). Alternatively, `-1`, interpreted as an unsigned number, would be the largest possible value, sometimes used to represent "infinity".

Whitney updated this revision to Diff 327835.Mar 3 2021, 9:52 AM

Whitney marked 3 inline comments as done.

Whitney retitled this revision from [LoopUnrollRuntime] Add option to unroll loops with at most a specified number of exit/exiting blocks. to [LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable..

Whitney edited the summary of this revision. (Show Details)

Whitney added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
490–492	Current behaviour: -unroll-runtime-multi-exit==true => return true -unroll-runtime-multi-exit==false => return false -unroll-runtime-multi-exit not specified => return (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) If we use this code: if (UnrollRuntimeCutoffPoint == 0 \|\| (ExitingBlocks.size() <= UnrollRuntimeCutoffPoint && OtherExits.size() < UnrollRuntimeCutoffPoint)) return true; -unroll-runtime-cutoff-point==0 => return true -unroll-runtime-cutoff-point==1 => return (ExitingBlocks.size() <= 1 OtherExits.size() < 1) \|\| (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) There is no initial value that can keep the current behaviour of (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()). You are right that there is no direct relation between number of exiting blocks and other exits, so we would have to use two cutoff points. After rethinking about it more, to achieve my original need, which is to allow unrolling a loop with at most 2 exiting blocks and at most 1 other exit block, I added an option to assume the other exit is predictable.

bmahjour added inline comments.Mar 3 2021, 10:03 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
490–492	ok, this diff is now more like what I had in mind in the beginning (see https://reviews.llvm.org/D97747#inline-916962). Could you please also update the comments on line 506?

Add comment.

Whitney marked an inline comment as done.Mar 3 2021, 10:17 AM

Meinersbur added inline comments.Mar 3 2021, 10:41 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
499–501	This seems to make sense, but isn't this already covered by `!L->getExitingBlock() \|\| OtherExits.size()` in the function that calls `canProfitablyUnrollMultiExitLoop`? (I think we could keep it here, just to make it more explicit)
510	Looks sensible to me. @bmahjour ?

Whitney marked 2 inline comments as done.Mar 3 2021, 10:51 AM

bmahjour accepted this revision.Mar 3 2021, 10:55 AM

bmahjour added inline comments.

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
510	LGTM too.

This revision is now accepted and ready to land.Mar 3 2021, 10:55 AM

Meinersbur accepted this revision.Mar 3 2021, 11:06 AM

Harbormaster completed remote builds in B91806: Diff 327773.Mar 3 2021, 11:32 AM

This revision was landed with ongoing or failed builds.Mar 3 2021, 12:43 PM

Closed by commit rG58d531fd6f04: [LoopUnrollRuntime] Add option to assume the non latch exit block to be (authored by Whitney). · Explain Why

This revision was automatically updated to reflect the committed changes.

Whitney added a commit: rG58d531fd6f04: [LoopUnrollRuntime] Add option to assume the non latch exit block to be.

Harbormaster completed remote builds in B91817: Diff 327792.Mar 3 2021, 1:15 PM

Harbormaster completed remote builds in B91846: Diff 327835.Mar 3 2021, 5:19 PM

Harbormaster completed remote builds in B91854: Diff 327848.Mar 3 2021, 6:26 PM

Whitney added a commit: rG40391cef6164: [LoopUnrollRuntime] Add option to assume the non latch exit block to be.Mar 7 2021, 3:50 PM

Diff 327848

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
#define DEBUG_TYPE "loop-unroll"		#define DEBUG_TYPE "loop-unroll"

STATISTIC(NumRuntimeUnrolled,		STATISTIC(NumRuntimeUnrolled,
"Number of loops unrolled with run-time trip counts");		"Number of loops unrolled with run-time trip counts");
static cl::opt<bool> UnrollRuntimeMultiExit(		static cl::opt<bool> UnrollRuntimeMultiExit(
"unroll-runtime-multi-exit", cl::init(false), cl::Hidden,		"unroll-runtime-multi-exit", cl::init(false), cl::Hidden,
cl::desc("Allow runtime unrolling for loops with multiple exits, when "		cl::desc("Allow runtime unrolling for loops with multiple exits, when "
"epilog is generated"));		"epilog is generated"));
		static cl::opt<bool> UnrollRuntimeOtherExitPredictable(
		"unroll-runtime-other-exit-predictable", cl::init(false), cl::Hidden,
		cl::desc("Assume the non latch exit block to be predictable"));

/// Connect the unrolling prolog code to the original loop.		/// Connect the unrolling prolog code to the original loop.
		bmahjourUnsubmitted Done Reply Inline Actions I think we can avoid this extra option by initializing the Cutoff to 0. Any number larger than 0 would mean a valid cutoff. (ie 0 means no cutoff enabled, 1 means no multi-exit, 2 means at most 2 exits, etc). It would also render `UnrollRuntimeMultiExit` obsolete, which can be replaced in this patch or a subsequent one. bmahjour: I think we can avoid this extra option by initializing the Cutoff to 0. Any number larger than…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Why `UnrollRuntimeMultiExit` is obsolete? Do you mean the user of `UnrollRuntimeMultiExit` can put a very big number? Whitney: Why `UnrollRuntimeMultiExit` is obsolete? Do you mean the user of `UnrollRuntimeMultiExit` can…
		bmahjourUnsubmitted Done Reply Inline Actions Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number of exiting blocks, the less likely it will be profitable to unroll. bmahjour: Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number of…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions ok, I can put a review for that after this patch get landed. Whitney: ok, I can put a review for that after this patch get landed.
		bmahjourUnsubmitted Done Reply Inline Actions Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number of exiting blocks, the less likely it will be profitable to unroll. ...just to clarify this, as Michael pointed out my original intention was to have 0 mean no cutoff (ie any number of exit/exiting blocks are allowed)....but the same thing can effectively be achieved with a large value specified for the option. bmahjour: > Yes. I suppose a value of say 10 would be big enough for most cases....the bigger the number…
		MeinersburUnsubmitted Done Reply Inline Actions IMHO having 0 as no cutoff is more idiomatic than having an arbitrary-large cutoff (of 0 has otherwise no meaning). Alternatively, `-1`, interpreted as an unsigned number, would be the largest possible value, sometimes used to represent "infinity". Meinersbur: IMHO having 0 as no cutoff is more idiomatic than having an arbitrary-large cutoff (of 0 has…
/// The unrolling prolog code contains code to execute the		/// The unrolling prolog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the		/// 'extra' iterations if the run-time trip count modulo the
/// unroll count is non-zero.		/// unroll count is non-zero.
///		///
/// This function performs the following:		/// This function performs the following:
/// - Create PHI nodes at prolog end block to combine values		/// - Create PHI nodes at prolog end block to combine values
/// that exit the prolog code and jump around the prolog.		/// that exit the prolog code and jump around the prolog.
/// - Add a PHI operand to a PHI node at the loop exit block		/// - Add a PHI operand to a PHI node at the loop exit block
▲ Show 20 Lines • Show All 416 Lines • ▼ Show 20 Lines	#endif
// The second point is the increase in code size, but this is true		// The second point is the increase in code size, but this is true
// irrespective of multiple exits.		// irrespective of multiple exits.

// Note: Both the heuristics below are coarse grained. We are essentially		// Note: Both the heuristics below are coarse grained. We are essentially
// enabling unrolling of loops that have a single side exit other than the		// enabling unrolling of loops that have a single side exit other than the
// normal LatchExit (i.e. exiting into a deoptimize block).		// normal LatchExit (i.e. exiting into a deoptimize block).
// The heuristics considered are:		// The heuristics considered are:
// 1. low number of branches in the unrolled version.		// 1. low number of branches in the unrolled version.
// 2. high predictability of these extra branches.		// 2. high predictability of these extra branches.
// We avoid unrolling loops that have more than two exiting blocks. This		// We avoid unrolling loops that have more than two exiting blocks. This
// limits the total number of branches in the unrolled loop to be atmost		// limits the total number of branches in the unrolled loop to be atmost
		MeinersburUnsubmitted Done Reply Inline Actions I think @bmahjour's suggestion (and mine) was to completely disable UnrollRuntimeCutoffPoint if it is `0` if (UnrollRuntimeCutoffPoint == 0 \|\| (ExitingBlocks.size() <= UnrollRuntimeCutoffPoint && OtherExits.size() < UnrollRuntimeCutoffPoint)) return true; I don't really understand why it bounds both, ExitingBlock and OtherExists (instead of only one of them or the sum of such blocks). Could you explain? @bmahjour's other point is that it makes `UnrollRuntimeMultiExit` redundant: `-unroll-runtime-multi-exit=true` is equivalent to `-unroll-runtime-cutoff-point=0` and -unroll-runtime-multi-exit=false` is equivalent to `-unroll-runtime-cutoff-point=1` (I think in this case I think `canProfitablyUnrollMultiExitLoop` would not be called anyway). Do I understand correctly that you would remove the `-unroll-runtime-multi-exit` in a followup-patch? Meinersbur: I think @bmahjour's suggestion (and mine) was to completely disable UnrollRuntimeCutoffPoint if…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions Current behaviour: -unroll-runtime-multi-exit==true => return true -unroll-runtime-multi-exit==false => return false -unroll-runtime-multi-exit not specified => return (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) If we use this code: if (UnrollRuntimeCutoffPoint == 0 \|\| (ExitingBlocks.size() <= UnrollRuntimeCutoffPoint && OtherExits.size() < UnrollRuntimeCutoffPoint)) return true; -unroll-runtime-cutoff-point==0 => return true -unroll-runtime-cutoff-point==1 => return (ExitingBlocks.size() <= 1 OtherExits.size() < 1) \|\| (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) There is no initial value that can keep the current behaviour of (ExitingBlocks.size() <= 2 && OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()). You are right that there is no direct relation between number of exiting blocks and other exits, so we would have to use two cutoff points. After rethinking about it more, to achieve my original need, which is to allow unrolling a loop with at most 2 exiting blocks and at most 1 other exit block, I added an option to assume the other exit is predictable. Whitney: Current behaviour: -unroll-runtime-multi-exit==true => return true -unroll-runtime-multi…
		bmahjourUnsubmitted Done Reply Inline Actions ok, this diff is now more like what I had in mind in the beginning (see https://reviews.llvm.org/D97747#inline-916962). Could you please also update the comments on line 506? bmahjour: ok, this diff is now more like what I had in mind in the beginning (see https://reviews.llvm.
// the unroll factor (since one of the exiting blocks is the latch block).		// the unroll factor (since one of the exiting blocks is the latch block).
SmallVector<BasicBlock*, 4> ExitingBlocks;		SmallVector<BasicBlock*, 4> ExitingBlocks;
L->getExitingBlocks(ExitingBlocks);		L->getExitingBlocks(ExitingBlocks);
if (ExitingBlocks.size() > 2)		if (ExitingBlocks.size() > 2)
return false;		return false;

		// Allow unrolling of loops with no non latch exit blocks.
		if (OtherExits.size() == 0)
		return true;
		MeinersburUnsubmitted Done Reply Inline Actions This seems to make sense, but isn't this already covered by `!L->getExitingBlock() \|\| OtherExits.size()` in the function that calls `canProfitablyUnrollMultiExitLoop`? (I think we could keep it here, just to make it more explicit) Meinersbur: This seems to make sense, but isn't this already covered by `!L->getExitingBlock() \|\|…

		MeinersburUnsubmitted Done Reply Inline Actions `OtherExits.size() > 1` will also make the `getTerminatingDeoptimizeCall` heuristic fail, so this is not a semantics change. However, I find the structure confusing, especially if we want to add more conditions that enable the optimization. Did you consider the following structure? if (OtherExits.size() <= 1 && UnrollRuntimeAtMostTwoExits) return true; if (OtherExits.size() == 1 && OtherExits[0]->getTerminatingDeoptimizeCall()) return true; // More conditions can be added here return false; Meinersbur: `OtherExits.size() > 1` will also make the `getTerminatingDeoptimizeCall` heuristic fail, so…
		WhitneyAuthorUnsubmitted Done Reply Inline Actions What do you think of the current structure? I have added a cutoff option. Whitney: What do you think of the current structure? I have added a cutoff option.
// The second heuristic is that L has one exit other than the latchexit and		// The second heuristic is that L has one exit other than the latchexit and
// that exit is a deoptimize block. We know that deoptimize blocks are rarely		// that exit is a deoptimize block. We know that deoptimize blocks are rarely
// taken, which also implies the branch leading to the deoptimize block is		// taken, which also implies the branch leading to the deoptimize block is
// highly predictable.		// highly predictable. When UnrollRuntimeOtherExitPredictable is specified, we
		// assume the other exit branch is predictable even if it has no deoptimize
		bmahjourUnsubmitted Done Reply Inline Actions IIUC this is basically trying to avoid the check for the "deoptimize" call on line 514, right? In that case can we just add this check to that line and update the comments on line 512 with something like: ". When `UnrollRuntimeAtMostTwoExits` is specified we assume the other exit branch is predictable even if it has no deoptimize call"? bmahjour: IIUC this is basically trying to avoid the check for the "deoptimize" call on line 514, right?
		WhitneyAuthorUnsubmitted Done Reply Inline Actions When `UnrollRuntimeAtMostTwoExits` is specified, we also allows `OtherExits.size()` == 0. Whitney: When `UnrollRuntimeAtMostTwoExits` is specified, we also allows `OtherExits.size()` == 0.
		bmahjourUnsubmitted Done Reply Inline Actions I would have expected that when OtherExits is empty, we are not in a multi-exit-loop scenario so the result of this function won't matter...but I guess it's better to be cautious and explicitly handle that case. bmahjour: I would have expected that when OtherExits is empty, we are not in a multi-exit-loop scenario…
		// call.
return (OtherExits.size() == 1 &&		return (OtherExits.size() == 1 &&
OtherExits[0]->getTerminatingDeoptimizeCall());		(UnrollRuntimeOtherExitPredictable \|\|
		MeinersburUnsubmitted Not Done Reply Inline Actions Looks sensible to me. @bmahjour ? Meinersbur: Looks sensible to me. @bmahjour ?
		bmahjourUnsubmitted Not Done Reply Inline Actions LGTM too. bmahjour: LGTM too.
		OtherExits[0]->getTerminatingDeoptimizeCall()));
// TODO: These can be fine-tuned further to consider code size or deopt states		// TODO: These can be fine-tuned further to consider code size or deopt states
// that are captured by the deoptimize exit block.		// that are captured by the deoptimize exit block.
// Also, we can extend this to support more cases, if we actually		// Also, we can extend this to support more cases, if we actually
// know of kinds of multiexit loops that would benefit from unrolling.		// know of kinds of multiexit loops that would benefit from unrolling.
}		}

// Assign the maximum possible trip count as the back edge weight for the		// Assign the maximum possible trip count as the back edge weight for the
// remainder loop if the original loop comes with a branch weight.		// remainder loop if the original loop comes with a branch weight.
▲ Show 20 Lines • Show All 480 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/runtime-loop-at-most-two-exits.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=true -unroll-runtime-other-exit-predicatable=true -verify-loop-lcssa -verify-dom-info -verify-loop-info -S \| FileCheck %s --check-prefix=ENABLED
				; RUN: opt < %s -loop-unroll -unroll-runtime=true -unroll-runtime-epilog=true -unroll-runtime-other-exit-predicatable=false -verify-loop-lcssa -verify-dom-info -verify-loop-info -S \| FileCheck %s --check-prefix=DISABLED

				define i32 @test(i32* nocapture %a, i64 %n) {
				; ENABLED-LABEL: @test(
				; ENABLED-NEXT: entry:
				; ENABLED-NEXT: [[TMP0:%.]] = add i64 [[N:%.]], -1
				; ENABLED-NEXT: [[XTRAITER:%.*]] = and i64 [[N]], 7
				; ENABLED-NEXT: [[TMP1:%.*]] = icmp ult i64 [[TMP0]], 7
				; ENABLED-NEXT: br i1 [[TMP1]], label [[FOR_END_UNR_LCSSA:%.]], label [[ENTRY_NEW:%.]]
				; ENABLED: entry.new:
				; ENABLED-NEXT: [[UNROLL_ITER:%.*]] = sub i64 [[N]], [[XTRAITER]]
				; ENABLED-NEXT: br label [[HEADER:%.*]]
				; ENABLED: header:
				; ENABLED-NEXT: [[INDVARS_IV:%.]] = phi i64 [ 0, [[ENTRY_NEW]] ], [ [[INDVARS_IV_NEXT_7:%.]], [[FOR_BODY_7:%.*]] ]
				; ENABLED-NEXT: [[SUM_02:%.]] = phi i32 [ 0, [[ENTRY_NEW]] ], [ [[ADD_7:%.]], [[FOR_BODY_7]] ]
				; ENABLED-NEXT: [[NITER:%.]] = phi i64 [ [[UNROLL_ITER]], [[ENTRY_NEW]] ], [ [[NITER_NSUB_7:%.]], [[FOR_BODY_7]] ]
				; ENABLED-NEXT: [[CMP:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP]], label [[FOR_EXIT2_LOOPEXIT:%.]], label [[FOR_BODY:%.]]
				; ENABLED: for.body:
				; ENABLED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]
				; ENABLED-NEXT: [[TMP2:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; ENABLED-NEXT: [[ADD:%.*]] = add nsw i32 [[TMP2]], [[SUM_02]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT:%.*]] = add nuw nsw i64 [[INDVARS_IV]], 1
				; ENABLED-NEXT: [[NITER_NSUB:%.*]] = sub i64 [[NITER]], 1
				; ENABLED-NEXT: [[CMP_1:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_1]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_1:%.*]]
				; ENABLED: for.end.unr-lcssa.loopexit:
				; ENABLED-NEXT: [[SUM_0_LCSSA_PH_PH:%.*]] = phi i32 [ [[ADD_7]], [[FOR_BODY_7]] ]
				; ENABLED-NEXT: [[INDVARS_IV_UNR_PH:%.*]] = phi i64 [ [[INDVARS_IV_NEXT_7]], [[FOR_BODY_7]] ]
				; ENABLED-NEXT: [[SUM_02_UNR_PH:%.*]] = phi i32 [ [[ADD_7]], [[FOR_BODY_7]] ]
				; ENABLED-NEXT: br label [[FOR_END_UNR_LCSSA]]
				; ENABLED: for.end.unr-lcssa:
				; ENABLED-NEXT: [[SUM_0_LCSSA_PH:%.]] = phi i32 [ undef, [[ENTRY:%.]] ], [ [[SUM_0_LCSSA_PH_PH]], [[FOR_END_UNR_LCSSA_LOOPEXIT:%.*]] ]
				; ENABLED-NEXT: [[INDVARS_IV_UNR:%.*]] = phi i64 [ 0, [[ENTRY]] ], [ [[INDVARS_IV_UNR_PH]], [[FOR_END_UNR_LCSSA_LOOPEXIT]] ]
				; ENABLED-NEXT: [[SUM_02_UNR:%.*]] = phi i32 [ 0, [[ENTRY]] ], [ [[SUM_02_UNR_PH]], [[FOR_END_UNR_LCSSA_LOOPEXIT]] ]
				; ENABLED-NEXT: [[LCMP_MOD:%.*]] = icmp ne i64 [[XTRAITER]], 0
				; ENABLED-NEXT: br i1 [[LCMP_MOD]], label [[HEADER_EPIL_PREHEADER:%.]], label [[FOR_END:%.]]
				; ENABLED: header.epil.preheader:
				; ENABLED-NEXT: br label [[HEADER_EPIL:%.*]]
				; ENABLED: header.epil:
				; ENABLED-NEXT: [[INDVARS_IV_EPIL:%.]] = phi i64 [ [[INDVARS_IV_NEXT_EPIL:%.]], [[FOR_BODY_EPIL:%.*]] ], [ [[INDVARS_IV_UNR]], [[HEADER_EPIL_PREHEADER]] ]
				; ENABLED-NEXT: [[SUM_02_EPIL:%.]] = phi i32 [ [[ADD_EPIL:%.]], [[FOR_BODY_EPIL]] ], [ [[SUM_02_UNR]], [[HEADER_EPIL_PREHEADER]] ]
				; ENABLED-NEXT: [[EPIL_ITER:%.]] = phi i64 [ [[XTRAITER]], [[HEADER_EPIL_PREHEADER]] ], [ [[EPIL_ITER_SUB:%.]], [[FOR_BODY_EPIL]] ]
				; ENABLED-NEXT: [[CMP_EPIL:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_EPIL]], label [[FOR_EXIT2_LOOPEXIT2:%.*]], label [[FOR_BODY_EPIL]]
				; ENABLED: for.body.epil:
				; ENABLED-NEXT: [[ARRAYIDX_EPIL:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_EPIL]]
				; ENABLED-NEXT: [[TMP3:%.]] = load i32, i32 [[ARRAYIDX_EPIL]], align 4
				; ENABLED-NEXT: [[ADD_EPIL]] = add nsw i32 [[TMP3]], [[SUM_02_EPIL]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_EPIL]] = add i64 [[INDVARS_IV_EPIL]], 1
				; ENABLED-NEXT: [[EXITCOND_EPIL:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT_EPIL]], [[N]]
				; ENABLED-NEXT: [[EPIL_ITER_SUB]] = sub i64 [[EPIL_ITER]], 1
				; ENABLED-NEXT: [[EPIL_ITER_CMP:%.*]] = icmp ne i64 [[EPIL_ITER_SUB]], 0
				; ENABLED-NEXT: br i1 [[EPIL_ITER_CMP]], label [[HEADER_EPIL]], label [[FOR_END_EPILOG_LCSSA:%.]], [[LOOP0:!llvm.loop !.]]
				; ENABLED: for.end.epilog-lcssa:
				; ENABLED-NEXT: [[SUM_0_LCSSA_PH1:%.*]] = phi i32 [ [[ADD_EPIL]], [[FOR_BODY_EPIL]] ]
				; ENABLED-NEXT: br label [[FOR_END]]
				; ENABLED: for.end:
				; ENABLED-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[SUM_0_LCSSA_PH]], [[FOR_END_UNR_LCSSA]] ], [ [[SUM_0_LCSSA_PH1]], [[FOR_END_EPILOG_LCSSA]] ]
				; ENABLED-NEXT: ret i32 [[SUM_0_LCSSA]]
				; ENABLED: for.exit2.loopexit:
				; ENABLED-NEXT: [[RETVAL_PH:%.]] = phi i32 [ [[SUM_02]], [[HEADER]] ], [ [[ADD]], [[FOR_BODY]] ], [ [[ADD_1:%.]], [[FOR_BODY_1]] ], [ [[ADD_2:%.]], [[FOR_BODY_2:%.]] ], [ [[ADD_3:%.]], [[FOR_BODY_3:%.]] ], [ [[ADD_4:%.]], [[FOR_BODY_4:%.]] ], [ [[ADD_5:%.]], [[FOR_BODY_5:%.]] ], [ [[ADD_6:%.]], [[FOR_BODY_6:%.]] ]
				; ENABLED-NEXT: br label [[FOR_EXIT2:%.*]]
				; ENABLED: for.exit2.loopexit2:
				; ENABLED-NEXT: [[RETVAL_PH3:%.*]] = phi i32 [ [[SUM_02_EPIL]], [[HEADER_EPIL]] ]
				; ENABLED-NEXT: br label [[FOR_EXIT2]]
				; ENABLED: for.exit2:
				; ENABLED-NEXT: [[RETVAL:%.*]] = phi i32 [ [[RETVAL_PH]], [[FOR_EXIT2_LOOPEXIT]] ], [ [[RETVAL_PH3]], [[FOR_EXIT2_LOOPEXIT2]] ]
				; ENABLED-NEXT: ret i32 [[RETVAL]]
				; ENABLED: for.body.1:
				; ENABLED-NEXT: [[ARRAYIDX_1:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT]]
				; ENABLED-NEXT: [[TMP4:%.]] = load i32, i32 [[ARRAYIDX_1]], align 4
				; ENABLED-NEXT: [[ADD_1]] = add nsw i32 [[TMP4]], [[ADD]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_1:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT]], 1
				; ENABLED-NEXT: [[NITER_NSUB_1:%.*]] = sub i64 [[NITER_NSUB]], 1
				; ENABLED-NEXT: [[CMP_2:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_2]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_2]]
				; ENABLED: for.body.2:
				; ENABLED-NEXT: [[ARRAYIDX_2:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_1]]
				; ENABLED-NEXT: [[TMP5:%.]] = load i32, i32 [[ARRAYIDX_2]], align 4
				; ENABLED-NEXT: [[ADD_2]] = add nsw i32 [[TMP5]], [[ADD_1]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_2:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_1]], 1
				; ENABLED-NEXT: [[NITER_NSUB_2:%.*]] = sub i64 [[NITER_NSUB_1]], 1
				; ENABLED-NEXT: [[CMP_3:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_3]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_3]]
				; ENABLED: for.body.3:
				; ENABLED-NEXT: [[ARRAYIDX_3:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_2]]
				; ENABLED-NEXT: [[TMP6:%.]] = load i32, i32 [[ARRAYIDX_3]], align 4
				; ENABLED-NEXT: [[ADD_3]] = add nsw i32 [[TMP6]], [[ADD_2]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_3:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_2]], 1
				; ENABLED-NEXT: [[NITER_NSUB_3:%.*]] = sub i64 [[NITER_NSUB_2]], 1
				; ENABLED-NEXT: [[CMP_4:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_4]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_4]]
				; ENABLED: for.body.4:
				; ENABLED-NEXT: [[ARRAYIDX_4:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_3]]
				; ENABLED-NEXT: [[TMP7:%.]] = load i32, i32 [[ARRAYIDX_4]], align 4
				; ENABLED-NEXT: [[ADD_4]] = add nsw i32 [[TMP7]], [[ADD_3]]
				bmahjourUnsubmitted Done Reply Inline Actions how come it is still unrolled? bmahjour: how come it is still unrolled?
				WhitneyAuthorUnsubmitted Done Reply Inline Actions I should remove `-unroll-count=2` Whitney: I should remove `-unroll-count=2`
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_4:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_3]], 1
				; ENABLED-NEXT: [[NITER_NSUB_4:%.*]] = sub i64 [[NITER_NSUB_3]], 1
				; ENABLED-NEXT: [[CMP_5:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_5]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_5]]
				; ENABLED: for.body.5:
				; ENABLED-NEXT: [[ARRAYIDX_5:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_4]]
				; ENABLED-NEXT: [[TMP8:%.]] = load i32, i32 [[ARRAYIDX_5]], align 4
				; ENABLED-NEXT: [[ADD_5]] = add nsw i32 [[TMP8]], [[ADD_4]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_5:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_4]], 1
				; ENABLED-NEXT: [[NITER_NSUB_5:%.*]] = sub i64 [[NITER_NSUB_4]], 1
				; ENABLED-NEXT: [[CMP_6:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_6]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_6]]
				; ENABLED: for.body.6:
				; ENABLED-NEXT: [[ARRAYIDX_6:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_5]]
				; ENABLED-NEXT: [[TMP9:%.]] = load i32, i32 [[ARRAYIDX_6]], align 4
				; ENABLED-NEXT: [[ADD_6]] = add nsw i32 [[TMP9]], [[ADD_5]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_6:%.*]] = add nuw nsw i64 [[INDVARS_IV_NEXT_5]], 1
				; ENABLED-NEXT: [[NITER_NSUB_6:%.*]] = sub i64 [[NITER_NSUB_5]], 1
				; ENABLED-NEXT: [[CMP_7:%.*]] = icmp eq i64 [[N]], 42
				; ENABLED-NEXT: br i1 [[CMP_7]], label [[FOR_EXIT2_LOOPEXIT]], label [[FOR_BODY_7]]
				; ENABLED: for.body.7:
				; ENABLED-NEXT: [[ARRAYIDX_7:%.]] = getelementptr inbounds i32, i32 [[A]], i64 [[INDVARS_IV_NEXT_6]]
				; ENABLED-NEXT: [[TMP10:%.]] = load i32, i32 [[ARRAYIDX_7]], align 4
				; ENABLED-NEXT: [[ADD_7]] = add nsw i32 [[TMP10]], [[ADD_6]]
				; ENABLED-NEXT: [[INDVARS_IV_NEXT_7]] = add i64 [[INDVARS_IV_NEXT_6]], 1
				; ENABLED-NEXT: [[NITER_NSUB_7]] = sub i64 [[NITER_NSUB_6]], 1
				; ENABLED-NEXT: [[NITER_NCMP_7:%.*]] = icmp eq i64 [[NITER_NSUB_7]], 0
				; ENABLED-NEXT: br i1 [[NITER_NCMP_7]], label [[FOR_END_UNR_LCSSA_LOOPEXIT]], label [[HEADER]]
				;
				; DISABLED-LABEL: @test(
				; DISABLED-NEXT: entry:
				; DISABLED-NEXT: br label [[HEADER:%.*]]
				; DISABLED: header:
				; DISABLED-NEXT: [[INDVARS_IV:%.]] = phi i64 [ [[INDVARS_IV_NEXT:%.]], [[FOR_BODY:%.]] ], [ 0, [[ENTRY:%.]] ]
				; DISABLED-NEXT: [[SUM_02:%.]] = phi i32 [ [[ADD:%.]], [[FOR_BODY]] ], [ 0, [[ENTRY]] ]
				; DISABLED-NEXT: [[CMP:%.]] = icmp eq i64 [[N:%.]], 42
				; DISABLED-NEXT: br i1 [[CMP]], label [[FOR_EXIT2:%.*]], label [[FOR_BODY]]
				; DISABLED: for.body:
				; DISABLED-NEXT: [[ARRAYIDX:%.]] = getelementptr inbounds i32, i32 [[A:%.*]], i64 [[INDVARS_IV]]
				; DISABLED-NEXT: [[TMP0:%.]] = load i32, i32 [[ARRAYIDX]], align 4
				; DISABLED-NEXT: [[ADD]] = add nsw i32 [[TMP0]], [[SUM_02]]
				; DISABLED-NEXT: [[INDVARS_IV_NEXT]] = add i64 [[INDVARS_IV]], 1
				; DISABLED-NEXT: [[EXITCOND:%.*]] = icmp eq i64 [[INDVARS_IV_NEXT]], [[N]]
				; DISABLED-NEXT: br i1 [[EXITCOND]], label [[FOR_END:%.*]], label [[HEADER]]
				; DISABLED: for.end:
				; DISABLED-NEXT: [[SUM_0_LCSSA:%.*]] = phi i32 [ [[ADD]], [[FOR_BODY]] ]
				; DISABLED-NEXT: ret i32 [[SUM_0_LCSSA]]
				; DISABLED: for.exit2:
				; DISABLED-NEXT: [[RETVAL:%.*]] = phi i32 [ [[SUM_02]], [[HEADER]] ]
				; DISABLED-NEXT: ret i32 [[RETVAL]]
				;
				entry:
				br label %header

				header:
				%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
				%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
				%cmp = icmp eq i64 %n, 42
				br i1 %cmp, label %for.exit2, label %for.body

				for.body:
				%arrayidx = getelementptr inbounds i32, i32* %a, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4
				%add = add nsw i32 %0, %sum.02
				%indvars.iv.next = add i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, %n
				br i1 %exitcond, label %for.end, label %header

				for.end:
				%sum.0.lcssa = phi i32 [ %add, %for.body ]
				ret i32 %sum.0.lcssa

				for.exit2:
				%retval = phi i32 [ %sum.02, %header ]
				ret i32 %retval
				}

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable.
ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 327848

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/test/Transforms/LoopUnroll/runtime-loop-at-most-two-exits.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable.ClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 327848

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

llvm/test/Transforms/LoopUnroll/runtime-loop-at-most-two-exits.ll

[LoopUnrollRuntime] Add option to assume the non latch exit block to be predictable.
ClosedPublic