This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
14/20
LoopUnrollRuntime.cpp
-
test/Transforms/LoopUnroll/
-
Transforms/
-
LoopUnroll/
-
runtime-exit-phi-scev-invalidation.ll
-
runtime-loop-branchweight.ll
-
runtime-loop.ll
-
unroll-heuristics-pgo.ll

Differential D158642

LoopUnrollRuntime: Add weights to all branches
ClosedPublic

Authored by MatzeB on Aug 23 2023, 10:35 AM.

Download Raw Diff

Details

Reviewers

wenlei
davidxl
hoy
paulkirth
dnovillo
mtrofin
xur

Commits

rGb30c9c937802: LoopUnrollRuntime: Add weights to all branches

Summary

Make sure every conditional branch constructed by LoopUnrollRuntime code sets branch weights.

Add new 1:127 weights for the conditional jumps checking whether the whole (unrolled) loop should be skipped in the generated prolog or epilog code.
Remove updateLatchBranchWeightsForRemainderLoop function and just add weights immediately when constructing the relevant branches. This leads to simpler code and makes the code more obvious as every call to CreateCondBr now has a BranchWeights parameter.
Rework formula for epilogue latch weights, to assume equal distribution of remainders and remove assert (as I was able to reach this code when forcing small unroll factors on the commandline).

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MatzeB created this revision.Aug 23 2023, 10:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 10:35 AM

Herald added subscribers: modimo, zzheng, hiraditya, mcrosier. · View Herald Transcript

MatzeB requested review of this revision.Aug 23 2023, 10:35 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 10:35 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

MatzeB retitled this revision from LoopUnrollRuntime: Add branch weights to newly created branches to LoopUnrollRuntime: Add weights to all branches.Aug 23 2023, 10:38 AM

I guess we may need a discussion about how to express these "nearly always taken" / "nearly never taken" branch weights. The hardcoded 1:127 ratio in the current patch is pretty arbitrary.

Looking around I see LikelyBranchWeight in llvm/Transforms/Scalar/LowerExpectIntrinsic.cpp but it's private to the pass and a warning advises to use TargetTransformInfo::getPredictableBranchThreshold(). Intuitively it feels wrong to me to have branch_weights change based on the selected target here and I am not sure I follow why LowerExpectIntrinsic gets to be an exception and not use getPredictableBranchThreshold() then...

CC @lebedev.ri @spatel

Some history in

Maybe we should expose the "LikelyBranchWeight" of LowerExpectIntrinsic.cpp in an internal LLVM header? So frontends are still forced to use the llvm.expect abstractions while LLVM-internal code can do the quicker thing and add likely branch weights directly...

MatzeB mentioned this in D157462: LoopRotate: Add code to update branch weights.Aug 23 2023, 11:25 AM

Harbormaster completed remote builds in B254395: Diff 552790.Aug 23 2023, 12:25 PM

MatzeB mentioned this in D158668: RFC: Add getLikelyBranchWeight helper function.Aug 23 2023, 1:48 PM

mtrofin added a reviewer: xur.Aug 24 2023, 9:44 AM

nits.

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
792–798	should this logic be factored in a `getBranchWeights(Latch, Count)` - something like that?

This revision is now accepted and ready to land.Aug 24 2023, 9:48 AM

MatzeB mentioned this in D159322: LoopVectorize: Set branch_weight for conditional branches.Aug 31 2023, 6:05 PM

mtrofin added inline comments.Sep 1 2023, 7:48 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
796	one thought about the 1:127 ratio: how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` doing the setting - more readable, API captures the intent, etc?

hoy added inline comments.Sep 1 2023, 9:54 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
796	Can this be computed based on the loop preheader count and the main loop trip count? IIUC, the block execution count of the preheader indicates how many times the loop is entered from outside the loop. Thus that the loop trip count divided by the preheader count indicates an average trip count per loop execution. If that is greater than the unrolling factor, then the unrolled loop should be always executed?

MatzeB added inline comments.Sep 1 2023, 12:53 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
792–798	and put it where? I find that having the branch weight decisions right next to the code creating the branch helps readability of the code as you don't have to jump around.
796	how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` As above I think this hurts readability as it moves the logic away into another file. Also if we decide this being the preferred method, then I'd rather see a mass-refactoring of relevant places rather than starting with one-offs in diffs. Can this be computed based on the loop preheader count and the main loop trip count? I don't see how. The old preheader (if it existed at all) will give you the ratio of zero and non-zero trip counts. But this here needs to catch the cases of trip-counts being smaller than the unroll factor...

mtrofin added inline comments.Sep 1 2023, 1:13 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
796	how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` As above I think this hurts readability as it moves the logic away into another file. Also if we decide this being the preferred method, then I'd rather see a mass-refactoring of relevant places rather than starting with one-offs in diffs. I can see your way for the above, but here, the way I see it is that the choice of numbers and the intent behind them is not immediately obvious, so an API would make that clear. At minimum the values could be a constant that's reused? (maybe the `ArrayRef` overload of `createBranchWeights` would take that constant or something like that) If this set of weights is this way elsewhere, doing a refactoring after makes sense, but (maybe I searched superficially) can't seem to find them other than in your recent patches on this topic, that's why I was thinking this may be the opportune moment - wdyt?

hoy added inline comments.Sep 1 2023, 1:49 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
796	I don't see how. The old preheader (if it existed at all) will give you the ratio of zero and non-zero trip counts. But this here needs to catch the cases of trip-counts being smaller than the unroll factor... The old preheader can be just some block dominating the loop header, for the sake of block execution count. Right, the branch is comparing trip count per each loop entry with the unroll factor. But to simulate that probability at compile time, I feel we can use the aggregated loop trip count (i.e., profile count) divided by the preheader count to compare with the unroll factor. Let me know if this feels wrong.

hoy added inline comments.Sep 1 2023, 3:19 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
796	Talked offline. The average trip count is already used to check if loop unroll is beneficial. Here we are pretty sure the unrolled loop body is going to be run, so giving a 99.9...% ratio is reasonable.

factor out branch weight constants.

MatzeB marked 2 inline comments as done.Sep 1 2023, 3:53 PM

Harbormaster completed remote builds in B256368: Diff 555524.Sep 1 2023, 4:34 PM

wenlei added inline comments.Sep 4 2023, 2:35 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
61	nit: in the comment also explain why it's unlikely for loop to not enter unrolled version at all. Like does the unroll only happen with known trip count (which is bigger than unroll factor)?
62	It's good to have constants extracted out, but when we do that, we can try to make it meaningful when someone just look at the constants without looking at the uses. If {1,127} means low probability for not entering unrolled loop at all, it'd be confusing for the same {1,127} to also mean high probability for epilog loop to be executed. Easiest way to fix is to add comment for each individual weight..
310	nit: you mean `[0, count)`?
400–403	I know this is just refactoring from `updateLatchBranchWeightsForRemainderLoop`, but I'm wondering if the assumption is reasonable? If we assume linear distribution for loop trip counts, then the average trip count for the remainder should be `Count / 2`, rather than `Count - 1`?

MatzeB added inline comments.Sep 11 2023, 11:04 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
61	in the comment also explain why it's unlikely for loop to not enter unrolled version at all. Like does the unroll only happen with known trip count (which is bigger than unroll factor)? The header deals with the amount of iterations not fitting into the unroll factor. i.e. if the loop requires 43 iterations and you have an unroll factor of 4, then the header will do 3 iterations and we enter the unrolled loop to process the remaining 40 iterations (that are now 10 unrolled ones). However I think this is the wrong place to explain how this pass works. It's just a constant used in the algorithm. I'd rather not try to write explanations of how the algorithm works here, we should have that at the beginning of the file, in the functions performing the transformations and policy questions you hint at are in `LoopUnrollPass.cpp` this file here is mostly about mechanisms. I have a feeling what you are really asking about here is some reasoning why "127" is a good value. And honestly I don't have a deeper analysis behind this, except for looking at some small toy examples. This should be clearly better than the default 50:50 split we end up when not adding anything, so I hope we can leave tuning for later (if necessary).
62	If {1,127} means low probability for not entering unrolled loop at all, it'd be confusing for the same {1,127} to also mean high probability for epilog loop to be executed. This is about entering a loop epilogue, not the unrolled loop. And you are reading this correctly in that it should be a low probability. The comment already states this. The code either produces a prologue or an epilogue to deal with the extra iterations that don't fit the unrolled loop. Though again this seems like the wrong place to explain the intricacies of this pass, it's just some constants...

MatzeB added inline comments.Sep 11 2023, 11:16 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
400–403	CC @hoy who originally added this (D83187). I think it would be `(Count / 2) - 1` for linear distribution. Guess I can change it as part of this patch...

MatzeB added inline comments.Sep 11 2023, 11:56 AM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
400–403	Given there is a pre-header to rule out zero-trip counts and this being a loop backedge, the average should be `(Count - 2) / 2`.

wenlei added inline comments.Sep 11 2023, 1:24 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
61	The header deals with the amount of iterations not fitting into the unroll factor. i.e. if the loop requires 43 iterations and you have an unroll factor of 4, then the header will do 3 iterations and we enter the unrolled loop to process the remaining 40 iterations (that are now 10 unrolled ones). This is not answering the question, instead this is explaining how the pass works. However I think this is the wrong place to explain how this pass works. We do not need to explain how the pass works in order to explain why specific values are chosen. :) We can assume some knowledge of how the pass works from the readers. I have a feeling what you are really asking about here is some reasoning why "127" is a good value. Partially. But what I'm really asking is to put an explanation for the value you chose in the comment. Just saying this is generally unlikely, and it's a ball park estimate, not yet tuned can also be helpful (so people know they have more freedom to change it). I actually think we should put something like Hongtao's comment here. The average trip count is already used to check if loop unroll is beneficial. Here we are pretty sure the unrolled loop body is going to be run, so giving a 99.9...% ratio is reasonable.
62	This is about entering a loop epilogue, not the unrolled loop. For the unrolled loop, I was referring to the UnrolledLoopHeaderWeights above which shares the same value {1,127}. Sorry for causing confusion. And you are reading this correctly in that it should be a low probability. No, I read it as it's a high probability to enter loop epilog. Given that we have `1 / UF` chance of loop trip count being exact multiples of `UF`, we enter loop epilog with a probability of `(UF - 1) / UF`. Am I missing something here? And btw, if I'm missing something obvious here, the answer doesn't need to go into comments. Though again this seems like the wrong place to explain the intricacies of this pass, it's just some constants... I guess I care less where this is explained. But I think there needs to be an explanation for the values chosen somewhere. Currently there is no explanation. However, given the constants are now extracted out, I think the most natural way is to have the explanation go along with the constants/values. We don't need explain the intricacies of this pass, but IMHO we need enough context for people to understand why these values were chosen, and how confident original author was about the chosen value. For such explanation, we can assume certain knowledge of the pass. Staring at some values and scratching our heads just isn't a good experience, so I'd err on the side of over-communicate when it comes to chosen constants.

Rework formula for prolog/epilog loop-backedge weights as suggested by @wenlei .

MatzeB marked 2 inline comments as done.Sep 11 2023, 1:32 PM

MatzeB added inline comments.Sep 11 2023, 1:45 PM

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
62	Sorry if this is confusing, the test here is whether the loop trip count is so small that we skip the unrolled loop and enter the epilogue immediatey. Trying to make the comment more clear now...

Tweak comments for weight constants.

MatzeB edited the summary of this revision. (Show Details)Sep 11 2023, 1:49 PM

lgtm, thanks!

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp
400–403	Makes sense. I didn't consider the fact that it's a backedge.

This revision was landed with ongoing or failed builds.Sep 11 2023, 2:26 PM

Closed by commit rGb30c9c937802: LoopUnrollRuntime: Add weights to all branches (authored by MatzeB). · Explain Why

This revision was automatically updated to reflect the committed changes.

MatzeB marked 3 inline comments as done.

MatzeB added a commit: rGb30c9c937802: LoopUnrollRuntime: Add weights to all branches.

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Utils/

LoopUnrollRuntime.cpp

96 lines

test/

Transforms/

LoopUnroll/

runtime-exit-phi-scev-invalidation.ll

8 lines

runtime-loop-branchweight.ll

2 lines

runtime-loop.ll

67 lines

unroll-heuristics-pgo.ll

16 lines

Diff 556479

llvm/lib/Transforms/Utils/LoopUnrollRuntime.cpp

Show First 20 Lines • Show All 50 Lines • ▼ Show 20 Lines
static cl::opt<bool> UnrollRuntimeMultiExit(		static cl::opt<bool> UnrollRuntimeMultiExit(
"unroll-runtime-multi-exit", cl::init(false), cl::Hidden,		"unroll-runtime-multi-exit", cl::init(false), cl::Hidden,
cl::desc("Allow runtime unrolling for loops with multiple exits, when "		cl::desc("Allow runtime unrolling for loops with multiple exits, when "
"epilog is generated"));		"epilog is generated"));
static cl::opt<bool> UnrollRuntimeOtherExitPredictable(		static cl::opt<bool> UnrollRuntimeOtherExitPredictable(
"unroll-runtime-other-exit-predictable", cl::init(false), cl::Hidden,		"unroll-runtime-other-exit-predictable", cl::init(false), cl::Hidden,
cl::desc("Assume the non latch exit block to be predictable"));		cl::desc("Assume the non latch exit block to be predictable"));

		// Probability that the loop trip count is so small that after the prolog
		// we do not enter the unrolled loop at all.
		static const uint32_t UnrolledLoopHeaderWeights[] = {1, 127};
		wenleiUnsubmitted Done Reply Inline Actions nit: in the comment also explain why it's unlikely for loop to not enter unrolled version at all. Like does the unroll only happen with known trip count (which is bigger than unroll factor)? wenlei: nit: in the comment also explain why it's unlikely for loop to not enter unrolled version at…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions in the comment also explain why it's unlikely for loop to not enter unrolled version at all. Like does the unroll only happen with known trip count (which is bigger than unroll factor)? The header deals with the amount of iterations not fitting into the unroll factor. i.e. if the loop requires 43 iterations and you have an unroll factor of 4, then the header will do 3 iterations and we enter the unrolled loop to process the remaining 40 iterations (that are now 10 unrolled ones). However I think this is the wrong place to explain how this pass works. It's just a constant used in the algorithm. I'd rather not try to write explanations of how the algorithm works here, we should have that at the beginning of the file, in the functions performing the transformations and policy questions you hint at are in `LoopUnrollPass.cpp` this file here is mostly about mechanisms. I have a feeling what you are really asking about here is some reasoning why "127" is a good value. And honestly I don't have a deeper analysis behind this, except for looking at some small toy examples. This should be clearly better than the default 50:50 split we end up when not adding anything, so I hope we can leave tuning for later (if necessary). MatzeB: > in the comment also explain why it's unlikely for loop to not enter unrolled version at all.
		wenleiUnsubmitted Done Reply Inline Actions The header deals with the amount of iterations not fitting into the unroll factor. i.e. if the loop requires 43 iterations and you have an unroll factor of 4, then the header will do 3 iterations and we enter the unrolled loop to process the remaining 40 iterations (that are now 10 unrolled ones). This is not answering the question, instead this is explaining how the pass works. However I think this is the wrong place to explain how this pass works. We do not need to explain how the pass works in order to explain why specific values are chosen. :) We can assume some knowledge of how the pass works from the readers. I have a feeling what you are really asking about here is some reasoning why "127" is a good value. Partially. But what I'm really asking is to put an explanation for the value you chose in the comment. Just saying this is generally unlikely, and it's a ball park estimate, not yet tuned can also be helpful (so people know they have more freedom to change it). I actually think we should put something like Hongtao's comment here. The average trip count is already used to check if loop unroll is beneficial. Here we are pretty sure the unrolled loop body is going to be run, so giving a 99.9...% ratio is reasonable. wenlei: > The header deals with the amount of iterations not fitting into the unroll factor. i.e. if…
		// Probability that the epilogue loop will be executed at all.
		wenleiUnsubmitted Done Reply Inline Actions It's good to have constants extracted out, but when we do that, we can try to make it meaningful when someone just look at the constants without looking at the uses. If {1,127} means low probability for not entering unrolled loop at all, it'd be confusing for the same {1,127} to also mean high probability for epilog loop to be executed. Easiest way to fix is to add comment for each individual weight.. wenlei: It's good to have constants extracted out, but when we do that, we can try to make it…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions If {1,127} means low probability for not entering unrolled loop at all, it'd be confusing for the same {1,127} to also mean high probability for epilog loop to be executed. This is about entering a loop epilogue, not the unrolled loop. And you are reading this correctly in that it should be a low probability. The comment already states this. The code either produces a prologue or an epilogue to deal with the extra iterations that don't fit the unrolled loop. Though again this seems like the wrong place to explain the intricacies of this pass, it's just some constants... MatzeB: > If {1,127} means low probability for not entering unrolled loop at all, it'd be confusing…
		wenleiUnsubmitted Not Done Reply Inline Actions This is about entering a loop epilogue, not the unrolled loop. For the unrolled loop, I was referring to the UnrolledLoopHeaderWeights above which shares the same value {1,127}. Sorry for causing confusion. And you are reading this correctly in that it should be a low probability. No, I read it as it's a high probability to enter loop epilog. Given that we have `1 / UF` chance of loop trip count being exact multiples of `UF`, we enter loop epilog with a probability of `(UF - 1) / UF`. Am I missing something here? And btw, if I'm missing something obvious here, the answer doesn't need to go into comments. Though again this seems like the wrong place to explain the intricacies of this pass, it's just some constants... I guess I care less where this is explained. But I think there needs to be an explanation for the values chosen somewhere. Currently there is no explanation. However, given the constants are now extracted out, I think the most natural way is to have the explanation go along with the constants/values. We don't need explain the intricacies of this pass, but IMHO we need enough context for people to understand why these values were chosen, and how confident original author was about the chosen value. For such explanation, we can assume certain knowledge of the pass. Staring at some values and scratching our heads just isn't a good experience, so I'd err on the side of over-communicate when it comes to chosen constants. wenlei: > This is about entering a loop epilogue, not the unrolled loop. For the unrolled loop, I was…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions Sorry if this is confusing, the test here is whether the loop trip count is so small that we skip the unrolled loop and enter the epilogue immediatey. Trying to make the comment more clear now... MatzeB: Sorry if this is confusing, the test here is whether the loop trip count is so small that we…
		static const uint32_t EpilogHeaderWeights[] = {1, 127};

/// Connect the unrolling prolog code to the original loop.		/// Connect the unrolling prolog code to the original loop.
/// The unrolling prolog code contains code to execute the		/// The unrolling prolog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the		/// 'extra' iterations if the run-time trip count modulo the
/// unroll count is non-zero.		/// unroll count is non-zero.
///		///
/// This function performs the following:		/// This function performs the following:
/// - Create PHI nodes at prolog end block to combine values		/// - Create PHI nodes at prolog end block to combine values
/// that exit the prolog code and jump around the prolog.		/// that exit the prolog code and jump around the prolog.
▲ Show 20 Lines • Show All 97 Lines • ▼ Show 20 Lines	static void ConnectProlog(Loop L, Value BECount, unsigned Count,
// then (BECount + 1) cannot unsigned-overflow.		// then (BECount + 1) cannot unsigned-overflow.
Value *BrLoopExit =		Value *BrLoopExit =
B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));		B.CreateICmpULT(BECount, ConstantInt::get(BECount->getType(), Count - 1));
// Split the exit to maintain loop canonicalization guarantees		// Split the exit to maintain loop canonicalization guarantees
SmallVector<BasicBlock *, 4> Preds(predecessors(OriginalLoopLatchExit));		SmallVector<BasicBlock *, 4> Preds(predecessors(OriginalLoopLatchExit));
SplitBlockPredecessors(OriginalLoopLatchExit, Preds, ".unr-lcssa", DT, LI,		SplitBlockPredecessors(OriginalLoopLatchExit, Preds, ".unr-lcssa", DT, LI,
nullptr, PreserveLCSSA);		nullptr, PreserveLCSSA);
// Add the branch to the exit block (around the unrolled loop)		// Add the branch to the exit block (around the unrolled loop)
B.CreateCondBr(BrLoopExit, OriginalLoopLatchExit, NewPreHeader);		MDNode *BranchWeights = nullptr;
		if (hasBranchWeightMD(*Latch->getTerminator())) {
		// Assume loop is nearly always entered.
		MDBuilder MDB(B.getContext());
		BranchWeights = MDB.createBranchWeights(UnrolledLoopHeaderWeights);
		}
		B.CreateCondBr(BrLoopExit, OriginalLoopLatchExit, NewPreHeader,
		BranchWeights);
InsertPt->eraseFromParent();		InsertPt->eraseFromParent();
if (DT) {		if (DT) {
auto *NewDom = DT->findNearestCommonDominator(OriginalLoopLatchExit,		auto *NewDom = DT->findNearestCommonDominator(OriginalLoopLatchExit,
PrologExit);		PrologExit);
DT->changeImmediateDominator(OriginalLoopLatchExit, NewDom);		DT->changeImmediateDominator(OriginalLoopLatchExit, NewDom);
}		}
}		}

/// Connect the unrolling epilog code to the original loop.		/// Connect the unrolling epilog code to the original loop.
/// The unrolling epilog code contains code to execute the		/// The unrolling epilog code contains code to execute the
/// 'extra' iterations if the run-time trip count modulo the		/// 'extra' iterations if the run-time trip count modulo the
/// unroll count is non-zero.		/// unroll count is non-zero.
///		///
/// This function performs the following:		/// This function performs the following:
/// - Update PHI nodes at the unrolling loop exit and epilog loop exit		/// - Update PHI nodes at the unrolling loop exit and epilog loop exit
/// - Create PHI nodes at the unrolling loop exit to combine		/// - Create PHI nodes at the unrolling loop exit to combine
/// values that exit the unrolling loop code and jump around it.		/// values that exit the unrolling loop code and jump around it.
/// - Update PHI operands in the epilog loop by the new PHI nodes		/// - Update PHI operands in the epilog loop by the new PHI nodes
/// - Branch around the epilog loop if extra iters (ModVal) is zero.		/// - Branch around the epilog loop if extra iters (ModVal) is zero.
///		///
static void ConnectEpilog(Loop L, Value ModVal, BasicBlock *NewExit,		static void ConnectEpilog(Loop L, Value ModVal, BasicBlock *NewExit,
BasicBlock Exit, BasicBlock PreHeader,		BasicBlock Exit, BasicBlock PreHeader,
BasicBlock EpilogPreHeader, BasicBlock NewPreHeader,		BasicBlock EpilogPreHeader, BasicBlock NewPreHeader,
ValueToValueMapTy &VMap, DominatorTree *DT,		ValueToValueMapTy &VMap, DominatorTree *DT,
LoopInfo *LI, bool PreserveLCSSA,		LoopInfo *LI, bool PreserveLCSSA, ScalarEvolution &SE,
ScalarEvolution &SE) {		unsigned Count) {
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
assert(Latch && "Loop must have a latch");		assert(Latch && "Loop must have a latch");
BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);		BasicBlock *EpilogLatch = cast<BasicBlock>(VMap[Latch]);

// Loop structure should be the following:		// Loop structure should be the following:
//		//
// PreHeader		// PreHeader
// NewPreHeader		// NewPreHeader
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	static void ConnectEpilog(Loop L, Value ModVal, BasicBlock *NewExit,
IRBuilder<> B(InsertPt);		IRBuilder<> B(InsertPt);
Value *BrLoopExit = B.CreateIsNotNull(ModVal, "lcmp.mod");		Value *BrLoopExit = B.CreateIsNotNull(ModVal, "lcmp.mod");
assert(Exit && "Loop must have a single exit block only");		assert(Exit && "Loop must have a single exit block only");
// Split the epilogue exit to maintain loop canonicalization guarantees		// Split the epilogue exit to maintain loop canonicalization guarantees
SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));		SmallVector<BasicBlock*, 4> Preds(predecessors(Exit));
SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI, nullptr,		SplitBlockPredecessors(Exit, Preds, ".epilog-lcssa", DT, LI, nullptr,
PreserveLCSSA);		PreserveLCSSA);
// Add the branch to the exit block (around the unrolling loop)		// Add the branch to the exit block (around the unrolling loop)
B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit);		MDNode *BranchWeights = nullptr;
		if (hasBranchWeightMD(*Latch->getTerminator())) {
		// Assume equal distribution in interval [0, Count).
		wenleiUnsubmitted Done Reply Inline Actions nit: you mean `[0, count)`? wenlei: nit: you mean `[0, count)`?
		MDBuilder MDB(B.getContext());
		BranchWeights = MDB.createBranchWeights(1, Count - 1);
		}
		B.CreateCondBr(BrLoopExit, EpilogPreHeader, Exit, BranchWeights);
InsertPt->eraseFromParent();		InsertPt->eraseFromParent();
if (DT) {		if (DT) {
auto *NewDom = DT->findNearestCommonDominator(Exit, NewExit);		auto *NewDom = DT->findNearestCommonDominator(Exit, NewExit);
DT->changeImmediateDominator(Exit, NewDom);		DT->changeImmediateDominator(Exit, NewDom);
}		}

// Split the main loop exit to maintain canonicalization guarantees.		// Split the main loop exit to maintain canonicalization guarantees.
SmallVector<BasicBlock*, 4> NewExitPreds{Latch};		SmallVector<BasicBlock*, 4> NewExitPreds{Latch};
SplitBlockPredecessors(NewExit, NewExitPreds, ".loopexit", DT, LI, nullptr,		SplitBlockPredecessors(NewExit, NewExitPreds, ".loopexit", DT, LI, nullptr,
PreserveLCSSA);		PreserveLCSSA);
}		}

/// Create a clone of the blocks in a loop and connect them together. A new		/// Create a clone of the blocks in a loop and connect them together. A new
/// loop will be created including all cloned blocks, and the iterator of the		/// loop will be created including all cloned blocks, and the iterator of the
/// new loop switched to count NewIter down to 0.		/// new loop switched to count NewIter down to 0.
/// The cloned blocks should be inserted between InsertTop and InsertBot.		/// The cloned blocks should be inserted between InsertTop and InsertBot.
/// InsertTop should be new preheader, InsertBot new loop exit.		/// InsertTop should be new preheader, InsertBot new loop exit.
/// Returns the new cloned loop that is created.		/// Returns the new cloned loop that is created.
static Loop *		static Loop *
CloneLoopBlocks(Loop L, Value NewIter, const bool UseEpilogRemainder,		CloneLoopBlocks(Loop L, Value NewIter, const bool UseEpilogRemainder,
const bool UnrollRemainder,		const bool UnrollRemainder,
BasicBlock *InsertTop,		BasicBlock *InsertTop,
BasicBlock InsertBot, BasicBlock Preheader,		BasicBlock InsertBot, BasicBlock Preheader,
std::vector<BasicBlock *> &NewBlocks, LoopBlocksDFS &LoopBlocks,		std::vector<BasicBlock *> &NewBlocks,
ValueToValueMapTy &VMap, DominatorTree DT, LoopInfo LI) {		LoopBlocksDFS &LoopBlocks, ValueToValueMapTy &VMap,
		DominatorTree DT, LoopInfo LI, unsigned Count) {
StringRef suffix = UseEpilogRemainder ? "epil" : "prol";		StringRef suffix = UseEpilogRemainder ? "epil" : "prol";
BasicBlock *Header = L->getHeader();		BasicBlock *Header = L->getHeader();
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
Function *F = Header->getParent();		Function *F = Header->getParent();
LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();		LoopBlocksDFS::RPOIterator BlockBegin = LoopBlocks.beginRPO();
LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();		LoopBlocksDFS::RPOIterator BlockEnd = LoopBlocks.endRPO();
Loop *ParentLoop = L->getParentLoop();		Loop *ParentLoop = L->getParentLoop();
NewLoopsMap NewLoops;		NewLoopsMap NewLoops;
Show All 37 Lines	if (Latch == *BB) {
PHINode *NewIdx =		PHINode *NewIdx =
PHINode::Create(NewIter->getType(), 2, suffix + ".iter");		PHINode::Create(NewIter->getType(), 2, suffix + ".iter");
NewIdx->insertBefore(FirstLoopBB->getFirstNonPHIIt());		NewIdx->insertBefore(FirstLoopBB->getFirstNonPHIIt());
auto *Zero = ConstantInt::get(NewIdx->getType(), 0);		auto *Zero = ConstantInt::get(NewIdx->getType(), 0);
auto *One = ConstantInt::get(NewIdx->getType(), 1);		auto *One = ConstantInt::get(NewIdx->getType(), 1);
Value *IdxNext =		Value *IdxNext =
Builder.CreateAdd(NewIdx, One, NewIdx->getName() + ".next");		Builder.CreateAdd(NewIdx, One, NewIdx->getName() + ".next");
Value *IdxCmp = Builder.CreateICmpNE(IdxNext, NewIter, NewIdx->getName() + ".cmp");		Value *IdxCmp = Builder.CreateICmpNE(IdxNext, NewIter, NewIdx->getName() + ".cmp");
Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot);		MDNode *BranchWeights = nullptr;
		if (hasBranchWeightMD(*LatchBR)) {
		uint32_t ExitWeight;
		uint32_t BackEdgeWeight;
		if (Count >= 3) {
		// Note: We do not enter this loop for zero-remainders. The check
		// is at the end of the loop. We assume equal distribution between
		// possible remainders in [1, Count).
		ExitWeight = 1;
		BackEdgeWeight = (Count - 2) / 2;
		wenleiUnsubmitted Done Reply Inline Actions I know this is just refactoring from `updateLatchBranchWeightsForRemainderLoop`, but I'm wondering if the assumption is reasonable? If we assume linear distribution for loop trip counts, then the average trip count for the remainder should be `Count / 2`, rather than `Count - 1`? wenlei: I know this is just refactoring from `updateLatchBranchWeightsForRemainderLoop`, but I'm…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions CC @hoy who originally added this (D83187). I think it would be `(Count / 2) - 1` for linear distribution. Guess I can change it as part of this patch... MatzeB: CC @hoy who originally added this (D83187). I think it would be `(Count / 2) - 1` for linear…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions Given there is a pre-header to rule out zero-trip counts and this being a loop backedge, the average should be `(Count - 2) / 2`. MatzeB: Given there is a pre-header to rule out zero-trip counts and this being a loop backedge, the…
		wenleiUnsubmitted Not Done Reply Inline Actions Makes sense. I didn't consider the fact that it's a backedge. wenlei: Makes sense. I didn't consider the fact that it's a backedge.
		} else {
		// Unnecessary backedge, should never be taken. The conditional
		// jump should be optimized away later.
		ExitWeight = 1;
		BackEdgeWeight = 0;
		}
		MDBuilder MDB(Builder.getContext());
		BranchWeights = MDB.createBranchWeights(BackEdgeWeight, ExitWeight);
		}
		Builder.CreateCondBr(IdxCmp, FirstLoopBB, InsertBot, BranchWeights);
NewIdx->addIncoming(Zero, InsertTop);		NewIdx->addIncoming(Zero, InsertTop);
NewIdx->addIncoming(IdxNext, NewBB);		NewIdx->addIncoming(IdxNext, NewBB);
LatchBR->eraseFromParent();		LatchBR->eraseFromParent();
}		}
}		}

// Change the incoming values to the ones defined in the preheader or		// Change the incoming values to the ones defined in the preheader or
// cloned loop.		// cloned loop.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	return (OtherExits.size() == 1 &&
(UnrollRuntimeOtherExitPredictable \|\|		(UnrollRuntimeOtherExitPredictable \|\|
OtherExits[0]->getPostdominatingDeoptimizeCall()));		OtherExits[0]->getPostdominatingDeoptimizeCall()));
// TODO: These can be fine-tuned further to consider code size or deopt states		// TODO: These can be fine-tuned further to consider code size or deopt states
// that are captured by the deoptimize exit block.		// that are captured by the deoptimize exit block.
// Also, we can extend this to support more cases, if we actually		// Also, we can extend this to support more cases, if we actually
// know of kinds of multiexit loops that would benefit from unrolling.		// know of kinds of multiexit loops that would benefit from unrolling.
}		}

// Assign the maximum possible trip count as the back edge weight for the
// remainder loop if the original loop comes with a branch weight.
static void updateLatchBranchWeightsForRemainderLoop(Loop *OrigLoop,
Loop *RemainderLoop,
uint64_t UnrollFactor) {
uint64_t TrueWeight, FalseWeight;
BranchInst *LatchBR =
cast<BranchInst>(OrigLoop->getLoopLatch()->getTerminator());
if (!extractBranchWeights(*LatchBR, TrueWeight, FalseWeight))
return;
uint64_t ExitWeight = LatchBR->getSuccessor(0) == OrigLoop->getHeader()
? FalseWeight
: TrueWeight;
assert(UnrollFactor > 1);
uint64_t BackEdgeWeight = (UnrollFactor - 1) * ExitWeight;
BasicBlock *Header = RemainderLoop->getHeader();
BasicBlock *Latch = RemainderLoop->getLoopLatch();
auto *RemainderLatchBR = cast<BranchInst>(Latch->getTerminator());
unsigned HeaderIdx = (RemainderLatchBR->getSuccessor(0) == Header ? 0 : 1);
MDBuilder MDB(RemainderLatchBR->getContext());
MDNode *WeightNode =
HeaderIdx ? MDB.createBranchWeights(ExitWeight, BackEdgeWeight)
: MDB.createBranchWeights(BackEdgeWeight, ExitWeight);
RemainderLatchBR->setMetadata(LLVMContext::MD_prof, WeightNode);
}

/// Calculate ModVal = (BECount + 1) % Count on the abstract integer domain		/// Calculate ModVal = (BECount + 1) % Count on the abstract integer domain
/// accounting for the possibility of unsigned overflow in the 2s complement		/// accounting for the possibility of unsigned overflow in the 2s complement
/// domain. Preconditions:		/// domain. Preconditions:
/// 1) TripCount = BECount + 1 (allowing overflow)		/// 1) TripCount = BECount + 1 (allowing overflow)
/// 2) Log2(Count) <= BitWidth(BECount)		/// 2) Log2(Count) <= BitWidth(BECount)
static Value CreateTripRemainder(IRBuilder<> &B, Value BECount,		static Value CreateTripRemainder(IRBuilder<> &B, Value BECount,
Value *TripCount, unsigned Count) {		Value *TripCount, unsigned Count) {
// Note that TripCount is BECount + 1.		// Note that TripCount is BECount + 1.
▲ Show 20 Lines • Show All 269 Lines • ▼ Show 20 Lines	bool llvm::UnrollRuntimeLoopRemainder(
Value *BranchVal =		Value *BranchVal =
UseEpilogRemainder ? B.CreateICmpULT(BECount,		UseEpilogRemainder ? B.CreateICmpULT(BECount,
ConstantInt::get(BECount->getType(),		ConstantInt::get(BECount->getType(),
Count - 1)) :		Count - 1)) :
B.CreateIsNotNull(ModVal, "lcmp.mod");		B.CreateIsNotNull(ModVal, "lcmp.mod");
BasicBlock *RemainderLoop = UseEpilogRemainder ? NewExit : PrologPreHeader;		BasicBlock *RemainderLoop = UseEpilogRemainder ? NewExit : PrologPreHeader;
BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;		BasicBlock *UnrollingLoop = UseEpilogRemainder ? NewPreHeader : PrologExit;
// Branch to either remainder (extra iterations) loop or unrolling loop.		// Branch to either remainder (extra iterations) loop or unrolling loop.
B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop);		MDNode *BranchWeights = nullptr;
		if (hasBranchWeightMD(*Latch->getTerminator())) {
		// Assume loop is nearly always entered.
		MDBuilder MDB(B.getContext());
		BranchWeights = MDB.createBranchWeights(EpilogHeaderWeights);
		mtrofinUnsubmitted Not Done Reply Inline Actions one thought about the 1:127 ratio: how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` doing the setting - more readable, API captures the intent, etc? mtrofin: one thought about the 1:127 ratio: how about we had a `MDBuilder::createUnrolledLoopSkipWeights…
		hoyUnsubmitted Not Done Reply Inline Actions Can this be computed based on the loop preheader count and the main loop trip count? IIUC, the block execution count of the preheader indicates how many times the loop is entered from outside the loop. Thus that the loop trip count divided by the preheader count indicates an average trip count per loop execution. If that is greater than the unrolling factor, then the unrolled loop should be always executed? hoy: Can this be computed based on the loop preheader count and the main loop trip count? IIUC, the…
		MatzeBAuthorUnsubmitted Done Reply Inline Actions how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` As above I think this hurts readability as it moves the logic away into another file. Also if we decide this being the preferred method, then I'd rather see a mass-refactoring of relevant places rather than starting with one-offs in diffs. Can this be computed based on the loop preheader count and the main loop trip count? I don't see how. The old preheader (if it existed at all) will give you the ratio of zero and non-zero trip counts. But this here needs to catch the cases of trip-counts being smaller than the unroll factor... MatzeB: > how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` As above I think this hurts…
		mtrofinUnsubmitted Done Reply Inline Actions how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` As above I think this hurts readability as it moves the logic away into another file. Also if we decide this being the preferred method, then I'd rather see a mass-refactoring of relevant places rather than starting with one-offs in diffs. I can see your way for the above, but here, the way I see it is that the choice of numbers and the intent behind them is not immediately obvious, so an API would make that clear. At minimum the values could be a constant that's reused? (maybe the `ArrayRef` overload of `createBranchWeights` would take that constant or something like that) If this set of weights is this way elsewhere, doing a refactoring after makes sense, but (maybe I searched superficially) can't seem to find them other than in your recent patches on this topic, that's why I was thinking this may be the opportune moment - wdyt? mtrofin: > > how about we had a `MDBuilder::createUnrolledLoopSkipWeights()` > > As above I think this…
		hoyUnsubmitted Not Done Reply Inline Actions I don't see how. The old preheader (if it existed at all) will give you the ratio of zero and non-zero trip counts. But this here needs to catch the cases of trip-counts being smaller than the unroll factor... The old preheader can be just some block dominating the loop header, for the sake of block execution count. Right, the branch is comparing trip count per each loop entry with the unroll factor. But to simulate that probability at compile time, I feel we can use the aggregated loop trip count (i.e., profile count) divided by the preheader count to compare with the unroll factor. Let me know if this feels wrong. hoy: > I don't see how. The old preheader (if it existed at all) will give you the ratio of zero and…
		hoyUnsubmitted Not Done Reply Inline Actions Talked offline. The average trip count is already used to check if loop unroll is beneficial. Here we are pretty sure the unrolled loop body is going to be run, so giving a 99.9...% ratio is reasonable. hoy: Talked offline. The average trip count is already used to check if loop unroll is beneficial.
		}
		B.CreateCondBr(BranchVal, RemainderLoop, UnrollingLoop, BranchWeights);
		mtrofinUnsubmitted Done Reply Inline Actions should this logic be factored in a `getBranchWeights(Latch, Count)` - something like that? mtrofin: should this logic be factored in a `getBranchWeights(Latch, Count)` - something like that?
		MatzeBAuthorUnsubmitted Done Reply Inline Actions and put it where? I find that having the branch weight decisions right next to the code creating the branch helps readability of the code as you don't have to jump around. MatzeB: and put it where? I find that having the branch weight decisions right next to the code…
PreHeaderBR->eraseFromParent();		PreHeaderBR->eraseFromParent();
if (DT) {		if (DT) {
if (UseEpilogRemainder)		if (UseEpilogRemainder)
DT->changeImmediateDominator(NewExit, PreHeader);		DT->changeImmediateDominator(NewExit, PreHeader);
else		else
DT->changeImmediateDominator(PrologExit, PreHeader);		DT->changeImmediateDominator(PrologExit, PreHeader);
}		}
Function *F = Header->getParent();		Function *F = Header->getParent();
Show All 12 Lines	bool llvm::UnrollRuntimeLoopRemainder(

// Clone all the basic blocks in the loop. If Count is 2, we don't clone		// Clone all the basic blocks in the loop. If Count is 2, we don't clone
// the loop, otherwise we create a cloned loop to execute the extra		// the loop, otherwise we create a cloned loop to execute the extra
// iterations. This function adds the appropriate CFG connections.		// iterations. This function adds the appropriate CFG connections.
BasicBlock *InsertBot = UseEpilogRemainder ? LatchExit : PrologExit;		BasicBlock *InsertBot = UseEpilogRemainder ? LatchExit : PrologExit;
BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;		BasicBlock *InsertTop = UseEpilogRemainder ? EpilogPreHeader : PrologPreHeader;
Loop *remainderLoop = CloneLoopBlocks(		Loop *remainderLoop = CloneLoopBlocks(
L, ModVal, UseEpilogRemainder, UnrollRemainder, InsertTop, InsertBot,		L, ModVal, UseEpilogRemainder, UnrollRemainder, InsertTop, InsertBot,
NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI);		NewPreHeader, NewBlocks, LoopBlocks, VMap, DT, LI, Count);

// Assign the maximum possible trip count as the back edge weight for the
// remainder loop if the original loop comes with a branch weight.
if (remainderLoop && !UnrollRemainder)
updateLatchBranchWeightsForRemainderLoop(L, remainderLoop, Count);

// Insert the cloned blocks into the function.		// Insert the cloned blocks into the function.
F->splice(InsertBot->getIterator(), F, NewBlocks[0]->getIterator(), F->end());		F->splice(InsertBot->getIterator(), F, NewBlocks[0]->getIterator(), F->end());

// Now the loop blocks are cloned and the other exiting blocks from the		// Now the loop blocks are cloned and the other exiting blocks from the
// remainder are connected to the original Loop's exit blocks. The remaining		// remainder are connected to the original Loop's exit blocks. The remaining
// work is to update the phi nodes in the original loop, and take in the		// work is to update the phi nodes in the original loop, and take in the
// values from the cloned region.		// values from the cloned region.
▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	for (Instruction &I : *BB) {
RF_NoModuleLevelChanges \| RF_IgnoreMissingLocals);		RF_NoModuleLevelChanges \| RF_IgnoreMissingLocals);
}		}
}		}

if (UseEpilogRemainder) {		if (UseEpilogRemainder) {
// Connect the epilog code to the original loop and update the		// Connect the epilog code to the original loop and update the
// PHI functions.		// PHI functions.
ConnectEpilog(L, ModVal, NewExit, LatchExit, PreHeader, EpilogPreHeader,		ConnectEpilog(L, ModVal, NewExit, LatchExit, PreHeader, EpilogPreHeader,
NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE);		NewPreHeader, VMap, DT, LI, PreserveLCSSA, *SE, Count);

// Update counter in loop for unrolling.		// Update counter in loop for unrolling.
// Use an incrementing IV. Pre-incr/post-incr is backedge/trip count.		// Use an incrementing IV. Pre-incr/post-incr is backedge/trip count.
// Subtle: TestVal can be 0 if we wrapped when computing the trip count,		// Subtle: TestVal can be 0 if we wrapped when computing the trip count,
// thus we must compare the post-increment (wrapping) value.		// thus we must compare the post-increment (wrapping) value.
IRBuilder<> B2(NewPreHeader->getTerminator());		IRBuilder<> B2(NewPreHeader->getTerminator());
Value *TestVal = B2.CreateSub(TripCount, ModVal, "unroll_iter");		Value *TestVal = B2.CreateSub(TripCount, ModVal, "unroll_iter");
BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());		BranchInst *LatchBR = cast<BranchInst>(Latch->getTerminator());
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/runtime-exit-phi-scev-invalidation.ll

	Show First 20 Lines • Show All 159 Lines • ▼ Show 20 Lines
	; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[X]]			; CHECK-NEXT: [[TMP1:%.*]] = sub i64 [[TMP0]], [[X]]
	; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]			; CHECK-NEXT: br label [[OUTER_HEADER:%.*]]
	; CHECK: outer.header:			; CHECK: outer.header:
	; CHECK-NEXT: [[OUTER_P:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[L_1_LCSSA:%.]], [[OUTER_LATCH:%.]] ]			; CHECK-NEXT: [[OUTER_P:%.]] = phi i32 [ 0, [[BB:%.]] ], [ [[L_1_LCSSA:%.]], [[OUTER_LATCH:%.]] ]
	; CHECK-NEXT: [[TMP2:%.*]] = freeze i64 [[TMP1]]			; CHECK-NEXT: [[TMP2:%.*]] = freeze i64 [[TMP1]]
	; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], -1			; CHECK-NEXT: [[TMP3:%.*]] = add i64 [[TMP2]], -1
	; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TMP2]], 7			; CHECK-NEXT: [[XTRAITER:%.*]] = and i64 [[TMP2]], 7
	; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp ne i64 [[XTRAITER]], 0			; CHECK-NEXT: [[LCMP_MOD:%.*]] = icmp ne i64 [[XTRAITER]], 0
	; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[INNER_1_HEADER_PROL_PREHEADER:%.]], label [[INNER_1_HEADER_PROL_LOOPEXIT:%.]]			; CHECK-NEXT: br i1 [[LCMP_MOD]], label [[INNER_1_HEADER_PROL_PREHEADER:%.]], label [[INNER_1_HEADER_PROL_LOOPEXIT:%.]], !prof [[PROF3:![0-9]+]]
	; CHECK: inner.1.header.prol.preheader:			; CHECK: inner.1.header.prol.preheader:
	; CHECK-NEXT: br label [[INNER_1_HEADER_PROL:%.*]]			; CHECK-NEXT: br label [[INNER_1_HEADER_PROL:%.*]]
	; CHECK: inner.1.header.prol:			; CHECK: inner.1.header.prol:
	; CHECK-NEXT: [[INNER_1_IV_PROL:%.]] = phi i64 [ [[X]], [[INNER_1_HEADER_PROL_PREHEADER]] ], [ [[INNER_1_IV_NEXT_PROL:%.]], [[INNER_1_LATCH_PROL:%.*]] ]			; CHECK-NEXT: [[INNER_1_IV_PROL:%.]] = phi i64 [ [[X]], [[INNER_1_HEADER_PROL_PREHEADER]] ], [ [[INNER_1_IV_NEXT_PROL:%.]], [[INNER_1_LATCH_PROL:%.*]] ]
	; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ 0, [[INNER_1_HEADER_PROL_PREHEADER]] ], [ [[PROL_ITER_NEXT:%.]], [[INNER_1_LATCH_PROL]] ]			; CHECK-NEXT: [[PROL_ITER:%.]] = phi i64 [ 0, [[INNER_1_HEADER_PROL_PREHEADER]] ], [ [[PROL_ITER_NEXT:%.]], [[INNER_1_LATCH_PROL]] ]
	; CHECK-NEXT: [[CMP_1_PROL:%.*]] = icmp sgt i32 [[OUTER_P]], 0			; CHECK-NEXT: [[CMP_1_PROL:%.*]] = icmp sgt i32 [[OUTER_P]], 0
	; CHECK-NEXT: br i1 [[CMP_1_PROL]], label [[EXIT_DEOPT_LOOPEXIT1:%.*]], label [[INNER_1_LATCH_PROL]]			; CHECK-NEXT: br i1 [[CMP_1_PROL]], label [[EXIT_DEOPT_LOOPEXIT1:%.*]], label [[INNER_1_LATCH_PROL]]
	; CHECK: inner.1.latch.prol:			; CHECK: inner.1.latch.prol:
	; CHECK-NEXT: [[L_1_PROL:%.]] = load i32, ptr [[SRC:%.]], align 4			; CHECK-NEXT: [[L_1_PROL:%.]] = load i32, ptr [[SRC:%.]], align 4
	; CHECK-NEXT: store i32 [[L_1_PROL]], ptr [[DST:%.*]], align 8			; CHECK-NEXT: store i32 [[L_1_PROL]], ptr [[DST:%.*]], align 8
	; CHECK-NEXT: [[INNER_1_IV_NEXT_PROL]] = add i64 [[INNER_1_IV_PROL]], 1			; CHECK-NEXT: [[INNER_1_IV_NEXT_PROL]] = add i64 [[INNER_1_IV_PROL]], 1
	; CHECK-NEXT: [[CMP_2_PROL:%.*]] = icmp sgt i64 [[INNER_1_IV_PROL]], 0			; CHECK-NEXT: [[CMP_2_PROL:%.*]] = icmp sgt i64 [[INNER_1_IV_PROL]], 0
	; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1			; CHECK-NEXT: [[PROL_ITER_NEXT]] = add i64 [[PROL_ITER]], 1
	; CHECK-NEXT: [[PROL_ITER_CMP:%.*]] = icmp ne i64 [[PROL_ITER_NEXT]], [[XTRAITER]]			; CHECK-NEXT: [[PROL_ITER_CMP:%.*]] = icmp ne i64 [[PROL_ITER_NEXT]], [[XTRAITER]]
	; CHECK-NEXT: br i1 [[PROL_ITER_CMP]], label [[INNER_1_HEADER_PROL]], label [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA:%.*]], !prof [[PROF3:![0-9]+]], !llvm.loop [[LOOP4:![0-9]+]]			; CHECK-NEXT: br i1 [[PROL_ITER_CMP]], label [[INNER_1_HEADER_PROL]], label [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA:%.*]], !prof [[PROF4:![0-9]+]], !llvm.loop [[LOOP5:![0-9]+]]
	; CHECK: inner.1.header.prol.loopexit.unr-lcssa:			; CHECK: inner.1.header.prol.loopexit.unr-lcssa:
	; CHECK-NEXT: [[L_1_LCSSA_UNR_PH:%.*]] = phi i32 [ [[L_1_PROL]], [[INNER_1_LATCH_PROL]] ]			; CHECK-NEXT: [[L_1_LCSSA_UNR_PH:%.*]] = phi i32 [ [[L_1_PROL]], [[INNER_1_LATCH_PROL]] ]
	; CHECK-NEXT: [[INNER_1_IV_UNR_PH:%.*]] = phi i64 [ [[INNER_1_IV_NEXT_PROL]], [[INNER_1_LATCH_PROL]] ]			; CHECK-NEXT: [[INNER_1_IV_UNR_PH:%.*]] = phi i64 [ [[INNER_1_IV_NEXT_PROL]], [[INNER_1_LATCH_PROL]] ]
	; CHECK-NEXT: br label [[INNER_1_HEADER_PROL_LOOPEXIT]]			; CHECK-NEXT: br label [[INNER_1_HEADER_PROL_LOOPEXIT]]
	; CHECK: inner.1.header.prol.loopexit:			; CHECK: inner.1.header.prol.loopexit:
	; CHECK-NEXT: [[L_1_LCSSA_UNR:%.*]] = phi i32 [ undef, [[OUTER_HEADER]] ], [ [[L_1_LCSSA_UNR_PH]], [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA]] ]			; CHECK-NEXT: [[L_1_LCSSA_UNR:%.*]] = phi i32 [ undef, [[OUTER_HEADER]] ], [ [[L_1_LCSSA_UNR_PH]], [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[INNER_1_IV_UNR:%.*]] = phi i64 [ [[X]], [[OUTER_HEADER]] ], [ [[INNER_1_IV_UNR_PH]], [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA]] ]			; CHECK-NEXT: [[INNER_1_IV_UNR:%.*]] = phi i64 [ [[X]], [[OUTER_HEADER]] ], [ [[INNER_1_IV_UNR_PH]], [[INNER_1_HEADER_PROL_LOOPEXIT_UNR_LCSSA]] ]
	; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 7			; CHECK-NEXT: [[TMP4:%.*]] = icmp ult i64 [[TMP3]], 7
	; CHECK-NEXT: br i1 [[TMP4]], label [[OUTER_MIDDLE:%.]], label [[OUTER_HEADER_NEW:%.]]			; CHECK-NEXT: br i1 [[TMP4]], label [[OUTER_MIDDLE:%.]], label [[OUTER_HEADER_NEW:%.]], !prof [[PROF3]]
	; CHECK: outer.header.new:			; CHECK: outer.header.new:
	; CHECK-NEXT: br label [[INNER_1_HEADER:%.*]]			; CHECK-NEXT: br label [[INNER_1_HEADER:%.*]]
	; CHECK: inner.1.header:			; CHECK: inner.1.header:
	; CHECK-NEXT: [[INNER_1_IV:%.]] = phi i64 [ [[INNER_1_IV_UNR]], [[OUTER_HEADER_NEW]] ], [ [[INNER_1_IV_NEXT_7:%.]], [[INNER_1_LATCH_7:%.*]] ]			; CHECK-NEXT: [[INNER_1_IV:%.]] = phi i64 [ [[INNER_1_IV_UNR]], [[OUTER_HEADER_NEW]] ], [ [[INNER_1_IV_NEXT_7:%.]], [[INNER_1_LATCH_7:%.*]] ]
	; CHECK-NEXT: [[CMP_1:%.*]] = icmp sgt i32 [[OUTER_P]], 0			; CHECK-NEXT: [[CMP_1:%.*]] = icmp sgt i32 [[OUTER_P]], 0
	; CHECK-NEXT: br i1 [[CMP_1]], label [[EXIT_DEOPT_LOOPEXIT:%.]], label [[INNER_1_LATCH:%.]]			; CHECK-NEXT: br i1 [[CMP_1]], label [[EXIT_DEOPT_LOOPEXIT:%.]], label [[INNER_1_LATCH:%.]]
	; CHECK: inner.1.latch:			; CHECK: inner.1.latch:
	; CHECK-NEXT: [[L_1:%.*]] = load i32, ptr [[SRC]], align 4			; CHECK-NEXT: [[L_1:%.*]] = load i32, ptr [[SRC]], align 4
	Show All 27 Lines
	; CHECK-NEXT: store i32 [[L_1_6]], ptr [[DST]], align 8			; CHECK-NEXT: store i32 [[L_1_6]], ptr [[DST]], align 8
	; CHECK-NEXT: [[INNER_1_IV_NEXT_6:%.*]] = add i64 [[INNER_1_IV]], 7			; CHECK-NEXT: [[INNER_1_IV_NEXT_6:%.*]] = add i64 [[INNER_1_IV]], 7
	; CHECK-NEXT: br i1 false, label [[EXIT_DEOPT_LOOPEXIT]], label [[INNER_1_LATCH_7]]			; CHECK-NEXT: br i1 false, label [[EXIT_DEOPT_LOOPEXIT]], label [[INNER_1_LATCH_7]]
	; CHECK: inner.1.latch.7:			; CHECK: inner.1.latch.7:
	; CHECK-NEXT: [[L_1_7:%.*]] = load i32, ptr [[SRC]], align 4			; CHECK-NEXT: [[L_1_7:%.*]] = load i32, ptr [[SRC]], align 4
	; CHECK-NEXT: store i32 [[L_1_7]], ptr [[DST]], align 8			; CHECK-NEXT: store i32 [[L_1_7]], ptr [[DST]], align 8
	; CHECK-NEXT: [[INNER_1_IV_NEXT_7]] = add i64 [[INNER_1_IV]], 8			; CHECK-NEXT: [[INNER_1_IV_NEXT_7]] = add i64 [[INNER_1_IV]], 8
	; CHECK-NEXT: [[CMP_2_7:%.*]] = icmp sgt i64 [[INNER_1_IV_NEXT_6]], 0			; CHECK-NEXT: [[CMP_2_7:%.*]] = icmp sgt i64 [[INNER_1_IV_NEXT_6]], 0
	; CHECK-NEXT: br i1 [[CMP_2_7]], label [[OUTER_MIDDLE_UNR_LCSSA:%.*]], label [[INNER_1_HEADER]], !prof [[PROF5:![0-9]+]]			; CHECK-NEXT: br i1 [[CMP_2_7]], label [[OUTER_MIDDLE_UNR_LCSSA:%.*]], label [[INNER_1_HEADER]], !prof [[PROF6:![0-9]+]]
	; CHECK: outer.middle.unr-lcssa:			; CHECK: outer.middle.unr-lcssa:
	; CHECK-NEXT: [[L_1_LCSSA_PH:%.*]] = phi i32 [ [[L_1_7]], [[INNER_1_LATCH_7]] ]			; CHECK-NEXT: [[L_1_LCSSA_PH:%.*]] = phi i32 [ [[L_1_7]], [[INNER_1_LATCH_7]] ]
	; CHECK-NEXT: br label [[OUTER_MIDDLE]]			; CHECK-NEXT: br label [[OUTER_MIDDLE]]
	; CHECK: outer.middle:			; CHECK: outer.middle:
	; CHECK-NEXT: [[L_1_LCSSA]] = phi i32 [ [[L_1_LCSSA_UNR]], [[INNER_1_HEADER_PROL_LOOPEXIT]] ], [ [[L_1_LCSSA_PH]], [[OUTER_MIDDLE_UNR_LCSSA]] ]			; CHECK-NEXT: [[L_1_LCSSA]] = phi i32 [ [[L_1_LCSSA_UNR]], [[INNER_1_HEADER_PROL_LOOPEXIT]] ], [ [[L_1_LCSSA_PH]], [[OUTER_MIDDLE_UNR_LCSSA]] ]
	; CHECK-NEXT: br label [[INNER_2:%.*]]			; CHECK-NEXT: br label [[INNER_2:%.*]]
	; CHECK: inner.2:			; CHECK: inner.2:
	; CHECK-NEXT: [[INNER_2_IV:%.]] = phi i32 [ [[L_1_LCSSA]], [[OUTER_MIDDLE]] ], [ [[INNER_2_IV_NEXT_2:%.]], [[INNER_2]] ]			; CHECK-NEXT: [[INNER_2_IV:%.]] = phi i32 [ [[L_1_LCSSA]], [[OUTER_MIDDLE]] ], [ [[INNER_2_IV_NEXT_2:%.]], [[INNER_2]] ]
	▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopUnroll/runtime-loop-branchweight.ll

	; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck %s			; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime=true -unroll-count=4 \| FileCheck %s

	;; Check that the remainder loop is properly assigned a branch weight for its latch branch.			;; Check that the remainder loop is properly assigned a branch weight for its latch branch.
	; CHECK-LABEL: @test(			; CHECK-LABEL: @test(
	; CHECK-LABEL: for.body:			; CHECK-LABEL: for.body:
	; CHECK: br i1 [[COND1:%.*]], label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !prof ![[#PROF:]], !llvm.loop ![[#LOOP:]]			; CHECK: br i1 [[COND1:%.*]], label %for.end.loopexit.unr-lcssa.loopexit, label %for.body, !prof ![[#PROF:]], !llvm.loop ![[#LOOP:]]
	; CHECK-LABEL: for.body.epil:			; CHECK-LABEL: for.body.epil:
	; CHECK: br i1 [[COND2:%.*]], label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !prof ![[#PROF2:]], !llvm.loop ![[#LOOP2:]]			; CHECK: br i1 [[COND2:%.*]], label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !prof ![[#PROF2:]], !llvm.loop ![[#LOOP2:]]
	; CHECK: ![[#PROF]] = !{!"branch_weights", i32 1, i32 2499}			; CHECK: ![[#PROF]] = !{!"branch_weights", i32 1, i32 2499}
	; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 3, i32 1}			; CHECK: ![[#PROF2]] = !{!"branch_weights", i32 1, i32 1}

	define i3 @test(ptr %a, i3 %n) {			define i3 @test(ptr %a, i3 %n) {
	entry:			entry:
	%cmp1 = icmp eq i3 %n, 0			%cmp1 = icmp eq i3 %n, 0
	br i1 %cmp1, label %for.end, label %for.body			br i1 %cmp1, label %for.end, label %for.body

	for.body:			for.body:
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	Show All 15 Lines

llvm/test/Transforms/LoopUnroll/runtime-loop.ll

	Show All 12 Lines
	; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(loop-unroll-full)' -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefixes=NOPROLOG,COMMON			; RUN: opt < %s -S -passes='require<opt-remark-emit>,loop(loop-unroll-full)' -unroll-runtime=true -unroll-runtime-epilog=false \| FileCheck %s -check-prefixes=NOPROLOG,COMMON

	target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"			target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"

	; Tests for unrolling loops with run-time trip counts			; Tests for unrolling loops with run-time trip counts

	; COMMON-LABEL: @test(			; COMMON-LABEL: @test(

				; EPILOG: entry:
				; EPILOG: br i1 %cmp1, label %for.end, label %for.body.preheader, !prof [[EPILOG_PROF_0:![0-9]+]]
				; EPILOG: for.body.preheader:
	; EPILOG: %xtraiter = and i32 %n			; EPILOG: %xtraiter = and i32 %n
				; EPILOG: br i1 %1, label %for.end.loopexit.unr-lcssa, label %for.body.preheader.new, !prof [[EPILOG_PROF_1:![0-9]+]]

				; EPILOG: for.end.loopexit.unr-lcssa:
	; EPILOG: %lcmp.mod = icmp ne i32 %xtraiter, 0			; EPILOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
	; EPILOG: br i1 %lcmp.mod, label %for.body.epil.preheader, label %for.end.loopexit			; EPILOG: br i1 %lcmp.mod, label %for.body.epil.preheader, label %for.end.loopexit, !prof [[EPILOG_PROF_2:![0-9]+]]

	; NOEPILOG-NOT: %xtraiter = and i32 %n			; NOEPILOG-NOT: %xtraiter = and i32 %n

				; PROLOG: entry:
				; PROLOG: br i1 %cmp1, label %for.end, label %for.body.preheader, !prof [[PROLOG_PROF_0:![0-9]+]]

				; PROLOG: for.body.preheader:
	; PROLOG: %xtraiter = and i32 %n			; PROLOG: %xtraiter = and i32 %n
	; PROLOG: %lcmp.mod = icmp ne i32 %xtraiter, 0			; PROLOG: %lcmp.mod = icmp ne i32 %xtraiter, 0
	; PROLOG: br i1 %lcmp.mod, label %for.body.prol.preheader, label %for.body.prol.loopexit			; PROLOG: br i1 %lcmp.mod, label %for.body.prol.preheader, label %for.body.prol.loopexit, !prof [[PROLOG_PROF_1:![0-9]+]]

	; NOPROLOG-NOT: %xtraiter = and i32 %n			; NOPROLOG-NOT: %xtraiter = and i32 %n

	; EPILOG: for.body.epil:			; EPILOG: for.body.epil:
	; EPILOG: %indvars.iv.epil = phi i64 [ %indvars.iv.next.epil, %for.body.epil ], [ %indvars.iv.unr, %for.body.epil.preheader ]			; EPILOG: %indvars.iv.epil = phi i64 [ %indvars.iv.next.epil, %for.body.epil ], [ %indvars.iv.unr, %for.body.epil.preheader ]
	; EPILOG: %epil.iter.next = add i32 %epil.iter, 1			; EPILOG: %epil.iter.next = add i32 %epil.iter, 1
	; EPILOG: %epil.iter.cmp = icmp ne i32 %epil.iter.next, %xtraiter			; EPILOG: %epil.iter.cmp = icmp ne i32 %epil.iter.next, %xtraiter
	; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !llvm.loop !0			; EPILOG: br i1 %epil.iter.cmp, label %for.body.epil, label %for.end.loopexit.epilog-lcssa, !prof [[EPILOG_PROF_3:![0-9]+]], !llvm.loop [[EPILOG_LOOP:![0-9]+]]

	; NOEPILOG: for.body:			; NOEPILOG: for.body:
	; NOEPILOG-NOT: for.body.epil:			; NOEPILOG-NOT: for.body.epil:

	; PROLOG: for.body.prol:			; PROLOG: for.body.prol:
	; PROLOG: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]			; PROLOG: %indvars.iv.prol = phi i64 [ %indvars.iv.next.prol, %for.body.prol ], [ 0, %for.body.prol.preheader ]
	; PROLOG: %prol.iter.next = add i32 %prol.iter, 1			; PROLOG: %prol.iter.next = add i32 %prol.iter, 1
	; PROLOG: %prol.iter.cmp = icmp ne i32 %prol.iter.next, %xtraiter			; PROLOG: %prol.iter.cmp = icmp ne i32 %prol.iter.next, %xtraiter
	; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit.unr-lcssa, !llvm.loop !0			; PROLOG: br i1 %prol.iter.cmp, label %for.body.prol, label %for.body.prol.loopexit.unr-lcssa, !prof [[PROLOG_PROF_2:![0-9]+]], !llvm.loop [[PROLOG_LOOP:![0-9]+]]

				; PROLOG: for.body.prol.loopexit:
				; PROLOG: br i1 %2, label %for.end.loopexit, label %for.body.preheader.new, !prof [[PROLOG_PROF_1:![0-9]+]]

	; NOPROLOG: for.body:			; NOPROLOG: for.body:
	; NOPROLOG-NOT: for.body.prol:			; NOPROLOG-NOT: for.body.prol:


	define i32 @test(ptr nocapture %a, i32 %n) nounwind uwtable readonly {			define i32 @test(ptr nocapture %a, i32 %n) nounwind uwtable readonly !prof !2 {
	entry:			entry:
	%cmp1 = icmp eq i32 %n, 0			%cmp1 = icmp eq i32 %n, 0
	br i1 %cmp1, label %for.end, label %for.body			br i1 %cmp1, label %for.end, label %for.body, !prof !3

	for.body: ; preds = %for.body, %entry			for.body: ; preds = %for.body, %entry
	%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]			%indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
	%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]			%sum.02 = phi i32 [ %add, %for.body ], [ 0, %entry ]
	%arrayidx = getelementptr inbounds i32, ptr %a, i64 %indvars.iv			%arrayidx = getelementptr inbounds i32, ptr %a, i64 %indvars.iv
	%0 = load i32, ptr %arrayidx, align 4			%0 = load i32, ptr %arrayidx, align 4
	%add = add nsw i32 %0, %sum.02			%add = add nsw i32 %0, %sum.02
	%indvars.iv.next = add i64 %indvars.iv, 1			%indvars.iv.next = add i64 %indvars.iv, 1
	%lftr.wideiv = trunc i64 %indvars.iv.next to i32			%lftr.wideiv = trunc i64 %indvars.iv.next to i32
	%exitcond = icmp eq i32 %lftr.wideiv, %n			%exitcond = icmp eq i32 %lftr.wideiv, %n
	br i1 %exitcond, label %for.end, label %for.body			br i1 %exitcond, label %for.end, label %for.body, !prof !4

	for.end: ; preds = %for.body, %entry			for.end: ; preds = %for.body, %entry
	%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]			%sum.0.lcssa = phi i32 [ 0, %entry ], [ %add, %for.body ]
	ret i32 %sum.0.lcssa			ret i32 %sum.0.lcssa
	}			}


	; Still try to completely unroll loops with compile-time trip counts			; Still try to completely unroll loops with compile-time trip counts
	▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines
	ret void			ret void

	exit2.loopexit:			exit2.loopexit:
	ret void			ret void
	}			}

	!0 = distinct !{!0, !1}			!0 = distinct !{!0, !1}
	!1 = !{!"llvm.loop.unroll.runtime.disable"}			!1 = !{!"llvm.loop.unroll.runtime.disable"}
				!2 = !{!"function_entry_count", i64 1}
				!3 = !{!"branch_weights", i32 1, i32 11}
				!4 = !{!"branch_weights", i32 1, i32 42}

	; need to use LABEL here to separate function IR matching from metadata matching			; need to use LABEL here to separate function IR matching from metadata matching
	; COMMON-LABEL: {{^}}!0 =			; COMMON-LABEL: {{^}}!0 =

	; EPILOG-SAME: distinct !{!0, !1}			; EPILOG: [[EPILOG_PROF_0]] = !{!"branch_weights", i32 1, i32 11}
	; EPILOG: !1 = !{!"llvm.loop.unroll.disable"}			; EPILOG: [[EPILOG_PROF_1]] = !{!"branch_weights", i32 1, i32 127}
				; EPILOG: [[EPILOG_PROF_2]] = !{!"branch_weights", i32 1, i32 7}
				; EPILOG: [[EPILOG_PROF_3]] = !{!"branch_weights", i32 3, i32 1}

				; EPILOG: [[EPILOG_LOOP]] = distinct !{[[EPILOG_LOOP]], [[EPILOG_LOOP_1:![0-9]+]]}
				; EPILOG: [[EPILOG_LOOP_1]] = !{!"llvm.loop.unroll.disable"}

				; PROLOG: [[PROLOG_PROF_0]] = !{!"branch_weights", i32 1, i32 11}
				; PROLOG: [[PROLOG_PROF_1]] = !{!"branch_weights", i32 1, i32 127}
				; PROLOG: [[PROLOG_PROF_2]] = !{!"branch_weights", i32 3, i32 1}

	; PROLOG-SAME: distinct !{!0, !1}			; PROLOG: distinct !{[[PROLOG_LOOP]], [[PROLOG_LOOP_1:![0-9]+]]}
	; PROLOG: !1 = !{!"llvm.loop.unroll.disable"}			; PROLOG: [[PROLOG_LOOP_1]] = !{!"llvm.loop.unroll.disable"}

llvm/test/Transforms/LoopUnroll/unroll-heuristics-pgo.ll

	; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-max-percent-threshold-boost=100 \| FileCheck %s			; RUN: opt < %s -S -passes=loop-unroll -unroll-runtime -unroll-threshold=40 -unroll-max-percent-threshold-boost=100 \| FileCheck %s

	@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16			@known_constant = internal unnamed_addr constant [9 x i32] [i32 0, i32 -1, i32 0, i32 -1, i32 5, i32 -1, i32 0, i32 -1, i32 0], align 16

	; CHECK-LABEL: @bar_prof			; CHECK-LABEL: @bar_prof
	; CHECK: loop:			; CHECK: loop:
	; CHECK: %mul = mul			; CHECK: %mul = mul
	; CHECK: %mul.1 = mul			; CHECK: %mul.1 = mul
	; CHECK: %mul.2 = mul			; CHECK: %mul.2 = mul
	; CHECK: %mul.3 = mul			; CHECK: %mul.3 = mul
	; CHECK: br i1 %niter.ncmp.7, label %loop.end.unr-lcssa.loopexit, label %loop, !prof !1			; CHECK: br i1 %niter.ncmp.7, label %loop.end.unr-lcssa.loopexit, label %loop, !prof [[PROF0:![0-9]+]]
	; CHECK: loop.epil:			; CHECK: loop.epil:
	; CHECK: br i1 %epil.iter.cmp, label %loop.epil, label %loop.end.epilog-lcssa, !prof !2, !llvm.loop !3			; CHECK: br i1 %epil.iter.cmp, label %loop.epil, label %loop.end.epilog-lcssa, !prof [[PROF1:![0-9]+]], !llvm.loop {{![0-9]+}}
	define i32 @bar_prof(ptr noalias nocapture readonly %src, i64 %c) !prof !1 {			define i32 @bar_prof(ptr noalias nocapture readonly %src, i64 %c) !prof !1 {
	entry:			entry:
	br label %loop			br label %loop

	loop:			loop:
	%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]			%iv = phi i64 [ 0, %entry ], [ %inc, %loop ]
	%r = phi i32 [ 0, %entry ], [ %add, %loop ]			%r = phi i32 [ 0, %entry ], [ %add, %loop ]
	%arrayidx = getelementptr inbounds i32, ptr %src, i64 %iv			%arrayidx = getelementptr inbounds i32, ptr %src, i64 %iv
	Show All 33 Lines
	loop.end:			loop.end:
	%r.lcssa = phi i32 [ %r, %loop ]			%r.lcssa = phi i32 [ %r, %loop ]
	ret i32 %r.lcssa			ret i32 %r.lcssa
	}			}

	!1 = !{!"function_entry_count", i64 1}			!1 = !{!"function_entry_count", i64 1}
	!2 = !{!"branch_weights", i32 1, i32 1000}			!2 = !{!"branch_weights", i32 1, i32 1000}

	; CHECK: !1 = !{!"branch_weights", i32 1, i32 124}			; CHECK: [[PROF0]] = !{!"branch_weights", i32 1, i32 124}
	; CHECK: !2 = !{!"branch_weights", i32 7, i32 1}			; CHECK: [[PROF1]] = !{!"branch_weights", i32 3, i32 1}
	No newline at end of file