This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
2/2
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Utils/
27/30
LoopUtils.cpp
-
Vectorize/
19/34
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1/3
check-prof-info.ll
1/2
tripcount.ll

Differential D67905

[LV] Vectorizer should adjust trip count in profile information
ClosedPublic

Authored by ebrevnov on Sep 23 2019, 4:33 AM.

Download Raw Diff

Details

Reviewers

hsaito
Ayal
fhahn
reames
silvas
dcaballe
SjoerdMeijer
mkuper
DaniilSuchkov

Commits

rGaf7e1588727c: [LV] Vectorizer should adjust trip count in profile information

Summary

Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Diff Detail

Repository

rG LLVM Github Monorepo

Build Status

Buildable 42093
Build 42472: arc lint + arc unit

Event Timeline

ebrevnov created this revision.Sep 23 2019, 4:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2019, 4:33 AM

Herald added subscribers: llvm-commits, rkruppe, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B38417: Diff 221285.Sep 23 2019, 4:34 AM

ebrevnov added reviewers: hsaito, Ayal, fhahn, reames.Sep 23 2019, 4:36 AM

Minor test update

ping

ebrevnov added reviewers: silvas, dcaballe, SjoerdMeijer.Oct 24 2019, 4:18 AM

ebrevnov added a reviewer: mkuper.Oct 24 2019, 11:11 PM

ebrevnov added a reviewer: DaniilSuchkov.Nov 12 2019, 1:55 AM

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3974	Nit: VFxUF - 1
3980–3982	Style: it is usually advised to turn such conditions into early exits, it would reduce required indentation and slightly improve readability.
4001	Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit surprising that OrigSomething is being changed. Same goes to OrigFallThroughCount.

This revision is now accepted and ready to land.Nov 12 2019, 10:20 PM

Minor fixes as requested by reviewer.

I realized that current implementation has a flaw and and we should take into account that actual number of iterations is one greater than back edge taken count. In addition I believe that current structuring of calculations is easier for understanding.

LGTM

ebrevnov added a parent revision: D67805: [LV] Allow vectorization of hot short trip count loops with epilog.Nov 20 2019, 3:43 AM

Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3983	OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count, according to the code below. Wait to test if its zero until we know what it stands for?
3995	Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to call them Taken and FallThrough. Perhaps use Weight instead of Count, to denote total profile frequencies, as the latter is used elsewhere to denote the actual per-invocation TripCount.
3998	bel[l]ow
4000	How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount = (number of times header block was executed) / (number of times header was reached from pre-header == number of times latch exited) == (OrigTakenCount + OrigFallThroughCount) / OrigFallThroughCount == OrigTakenCount / OrigFallThroughCount + 1.
4002	How about VecAverageTripCount = OrigAverageTripCount / (VF * UF);
4006	Just to clarify, maintaining branch frequencies through optimizations is best-effort and imprecise - a total weight that does not divide VFUF implies that the trip count of at-least one invocation did not divide VFUF, not necessarily all of them; w/o considering also the distribution of trip counts in addition to their sum. Setting PRIterCount = 0 and VecAverageTipCount = round(OrigAverageTripCount / (VF*UV)) when Cost->foldTailByMasking() is probably the best that can be done. The former is redundant given that it applies to dead code, and the latter should perhaps apply to all cases, in general.
4010	There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VF*UF for each invocation of the loop, and hence the average is also strictly positive FWIW. But best keep the approximation general instead of trying to improve it, given general lack of information.
4017	This assumes the number of times the vector loop will be reached is equal to the number of times the original scalar loop was reached (OrigFallThrougCount). This holds is Cost->foldTailByMasking(), but otherwise invocations whose trip count < VFUF will bypass the vector loop (and also == VFUF if requireScalarEpilogue()), plus other run time guards.
4026	Similar to above comment, invocations whose trip count divides VF*UF will bypass the scalar remainder loop (w/o foldTailByMasking nor requireScalarEpilogue), so in general PEFallThroughCount <= OrigFallThroughCount.
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
4	May want to also check with UF>1.

Addressed Ayal's comments

ebrevnov marked an inline comment as done.Nov 21 2019, 4:59 AM

ebrevnov added inline comments.Nov 21 2019, 4:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3983	Good catch. Thanks!
3995	I fixed names. But I don't see reasons to use different variables here (if this is what you meant)
3998	fixed
4000	Ok. Turned your explanation to a comment.
4010	Agree. Let me remove this special case then.
4017	Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if vector loop is expected to be executed (VecIterCount > 0)
4026	Same explanation as for the above.
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
4	I replaced masked case since we don't do anything special for it now.

Harbormaster completed remote builds in B41303: Diff 230432.Nov 21 2019, 5:02 AM

ping @Ayal

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.

Patch needs to be clang-format'ed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3972	"is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF"
3979	OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >> OrigLoopExitWeight? May help to also set OrigLoopEntryWeight = OrigLoopExitWeight?
3982	OrigBackBranchI >> OrigLoopLatchBranch ?
3985	It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal). In any case, better rename "IsTrueBackEdge*Loop".
3999	Patch needs to be clang-format'ed
4006	Instead of providing the explanation in a comment, seems better to implement the code this way, leaving the +1 for the compiler to optimize. I.e., const uint64_t OrigHeaderBlockWeight = OrigBackedgeTakenWeight + OrigLoopEntryWeight; const unit64_t OrigAverageTripCount = OrigHeaderBlockWeight / OrigLoopEntryWeight;
4013	Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount?
4017	In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight VecFallThrough >> UnrolledLoopExitWeight and/or UnrolledLoopEntryWeight
4019	How about if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight = OrigLoopEntryWeight; uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above. UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight; } leaving the -1 optimization to the compiler.
4025	PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight and/or RemainderLoopEntryWeight

In D67905#1770563, @Ayal wrote:

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.

Patch needs to be clang-format'ed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3985	Don't feel convinced. My point would be that extra variables and conditional reassignments make the code less readable. I think this is very subjective thing.
4019	That will make computations less stable to overflow. Personally I feel the way it's written today has the same level of complexity for understanding.

Addressing issues raised by Ayal.

Typo fixed.

Harbormaster completed remote builds in B42093: Diff 232778.Dec 9 2019, 2:03 AM

Harbormaster completed remote builds in B42094: Diff 232779.

fedor.sergeev added a subscriber: fedor.sergeev.Dec 9 2019, 10:17 PM

ping @Ayal

Ayal added inline comments.Dec 18 2019, 1:51 PM

llvm/include/llvm/Transforms/Utils/LoopUtils.h
361	The fact that OrigLoop is both the original loop containing the original profile weights, and acts as the RemainderLoop dedicated to leftover iterations, should be clarified. Alternatively, this utility can receive three loops: OrigLoop, UnrolledLoop and RemainderLoop, leaving it to the caller to decide if to pass OrigLoop also as RemainderLoop. Would probably be clearer to start with UnrolledLoop receiving weights that reflect TC/UF iterations, and then OrigLoop which receives weights that reflect the remaining TC%UF iterations.
363	U[n]rolledLoop, several occurrences. "\UF" >> "\p UF"
llvm/lib/Transforms/Utils/LoopUtils.cpp
1040	Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly the weight associated with the (exit direction of the) latch branch.
1048	UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up overflowing lines, can use Orig, Unrolled and Remainder to stand for OrigLoop, UnrolledLoop and RemainderLoop; i.e., taking "Loop" out.
1051	VecLoop >> UnrolledLoop
1061	Can drop the 'const', for consistency; these temporaries are obviously const's.
1064	Note that this is rounding down. Can add half of the denominator to the nominator before dividing in order to round more accurately; this is what getLoopEstimatedTripCount() does, but it seems to be off by 1 as it computes BackEdgeTakenWeight / LoopEntryWeight rounded to nearest, instead of HeaderBlockWeight / LoopEntryWeight rounded to nearest... Simply call OrigAverageTripCount = `getLoopEstimatedTripCount(OrigLoop)`? Perhaps having a `setLoopEstimatedTripCount(Loop, EstimatedTripCount, EstimatedEntryWeight)` would help fold the identical treatment of UnrolledLoop and RemainderLoop into one function, which also takes care of figuring out the True/False vs. Backedge/Exit directions?
1066	U[n]rolledAverageTripCount
1076	Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set UnrolledLoopBackedgeWeight.
1085	ditto
1098	(This actually replaces the old profile metadata with the new one.)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
118	Is this include still needed here?
3456	Comment below should start with a short sentence explaining that profile weights associated with the original loop are now distributed among the vector and scalar loops.

ebrevnov marked 16 inline comments as done.Dec 26 2019, 3:29 AM

ebrevnov added inline comments.

llvm/lib/Transforms/Utils/LoopUtils.cpp
1040	IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight. Do you find OrigBackEdgeExitWeight good enough?
1048	Removed "Loop" from most names to make them a little shorter.
1064	This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount.

Updated as requested.

Harbormaster completed remote builds in B42951: Diff 235335.Dec 26 2019, 3:32 AM

Hi @Ayal. Thanks for you input. I fixed all places as you suggested. Please check.

Thanks for making all the changes! More comments inline.

llvm/lib/Transforms/Utils/LoopUtils.cpp
680–683	Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment, added in D64553?
705	dyn_cast >> cast Perhaps update above function to do here something like `BranchInst *LatchBR = getExpectedExitLoopLatchBranch(L)` checking if it returned nullptr or not?
722	Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a separate patch, and tests. Note that it also effects loop unrolling, i.e., its effects are beyond LV. This part can be introduced either before or after the part that teaches LV to maintain profiling info.
743	ditto (dyn_cast >> cast, ...)
744	Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())`
1036	"the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop"
1040	May also be worthwhile asserting that UF is positive (or greater than 1?)
1040	IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Definitely agree with (the preference of) finding self descriptive variable names. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight" How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. Do you find OrigBackEdgeExitWeight good enough? Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition.
1041	are expected to be distinct
1064	This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. Indeed, best fix and reuse, in a dedicated patch as raised above, thereby isolating impact on such "some cases".
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3465	"is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the vector loop, optimistically"
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
7	Tests targeting x86 need to reside in LoopVectorize/X86

Ayal mentioned this in D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..Dec 30 2019, 4:31 AM

One more round of updates.

ebrevnov edited parent revisions, added: D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount.; removed: D67805: [LV] Allow vectorization of hot short trip count loops with epilog.Dec 30 2019, 4:33 AM

Harbormaster completed remote builds in B43034: Diff 235579.Dec 30 2019, 4:35 AM

ebrevnov added inline comments.Dec 30 2019, 11:47 PM

llvm/lib/Transforms/Utils/LoopUtils.cpp
680–683	The comment is related to get/setLoopEstimatedTripCount and still there...
705	I was thinking about that in the first place but didn't come up with a "good" enough name. I can go with that name if you like :-)
722	Ok.
1040	I think we better support 1 which could be used in some corner cases....
1040	How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. For infinite loop entry is 1 while exit is 0 :-). I understand this is extreme we will never meet but still.... Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). That's why I used TakenCount and FallThroughCount in the very first version what perfectly matches your description. I don't feel we are getting any better names with more iterations.... Perhaps LatchCycleWeight - number of times we go to loop header from the latch and LatchExitWeight - number of times we go to loop exit from the latch? Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. I think LoopEntryExitWeight may be confusing.... I think it makes sense to use EstimatedLoopInvocationWeight in conjunction with EstimatedTripCount as parameters to get/setEstimatedTripCount interface while LatchCycleWeight and LatchExitWeight in the implementation as they are little-bit more low level.

Rebase

Harbormaster completed remote builds in B43561: Diff 236970.Jan 8 2020, 11:21 PM

Rebase

Harbormaster completed remote builds in B43570: Diff 236991.Jan 9 2020, 2:26 AM

ping @Ayal

This looks good to me, thanks!

llvm/lib/Transforms/Utils/LoopUtils.cpp
748	The last part could call fixupBranchWeights() if moved here from Transforms/Utils/LoopUnrollPeel.cpp
llvm/test/Transforms/LoopVectorize/tripcount.ll
211	Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000, therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001. Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1 for vectorized and remainder loops, respectively, therefore their estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with loop invocation weights of 10 are the above {10, 2490} and {10, 0}.

ebrevnov marked 2 inline comments as done.Jan 15 2020, 7:17 PM

ebrevnov added inline comments.

llvm/lib/Transforms/Utils/LoopUtils.cpp
748	That would require to change implementation for fixupBranchWeights since it disregards to update when back edge taken count is zero.
llvm/test/Transforms/LoopVectorize/tripcount.ll
211	I will add this text to the test. I that what you wanted (just not sure :-))?

Rebase

Harbormaster completed remote builds in B44139: Diff 238455.Jan 16 2020, 4:09 AM

Ayal added inline comments.Jan 16 2020, 11:55 PM

llvm/lib/Transforms/Utils/LoopUtils.cpp
748	Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in `fixupBranchWeights()`, `BackedgeTakenWeight` is called `FallThroughWeight`(?) "The weight of the edge from Latch to Header", and that // FallThroughWeight is 0 means that there is no branch weights on original // latch block or estimated trip count is zero. Regarding the first meaning of 0, whoever calls `fixupBranchWeights()` should do so only if there were such weights on the original latch block, similar to the caller of `setLoopEstimatedTripCount()`. Regarding the second meaning of 0, it seem `fixupBranchWeights()` suffers from same +1 issue: estimating trip count to be zero when backedge taken weight is zero. It would be good to fix and centralize the support for updating weights of loops, but such refactoring can be done as a separate follow-up patch, after landing this (accepted) patch.

Closed by commit rGaf7e1588727c: [LV] Vectorizer should adjust trip count in profile information (authored by ebrevnov). · Explain WhyJan 20 2020, 3:39 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

13 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

70 lines

Vectorize/

LoopVectorize.cpp

8 lines

test/

Transforms/

LoopVectorize/

check-prof-info.ll

97 lines

tripcount.ll

9 lines

Diff 232778

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 351 Lines • ▼ Show 20 Lines
	/// Returns true if \p S is defined and never is equal to signed/unsigned max.			/// Returns true if \p S is defined and never is equal to signed/unsigned max.
	bool cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,			bool cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
	bool Signed);			bool Signed);

	/// Returns true if \p S is defined and never is equal to signed/unsigned min.			/// Returns true if \p S is defined and never is equal to signed/unsigned min.
	bool cannotBeMinInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,			bool cannotBeMinInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
	bool Signed);			bool Signed);

				/// Update profile info for the \p OrigLoop and \p UnrolledLoop so that original
				/// number of iterations in the \p OrigLoop (TC) are distributed as follows.
				AyalUnsubmitted Done Reply Inline Actions The fact that OrigLoop is both the original loop containing the original profile weights, and acts as the RemainderLoop dedicated to leftover iterations, should be clarified. Alternatively, this utility can receive three loops: OrigLoop, UnrolledLoop and RemainderLoop, leaving it to the caller to decide if to pass OrigLoop also as RemainderLoop. Would probably be clearer to start with UnrolledLoop receiving weights that reflect TC/UF iterations, and then OrigLoop which receives weights that reflect the remaining TC%UF iterations. Ayal: The fact that OrigLoop is both the original loop containing the original profile weights, and…
				/// \p OrigLoop gets TC%UF iterations, while rest iterations are executed as
				/// part of \p UrolledLoop. In addition, \p UrolledLoop executes blocks of \UF
				AyalUnsubmitted Done Reply Inline Actions U[n]rolledLoop, several occurrences. "\UF" >> "\p UF" Ayal: U[n]rolledLoop, several occurrences. "\UF" >> "\p UF"
				/// original iterations thus will do TC/UF iterations in total.
				///
				/// This utility may be useful for such optimizations as unroller and
				/// vectorizer as it's typical transformation for them.
				///
				/// If \p OrigLoop has no profile info associated nothing is done.
				void fixProfileInfoAfterUnrolling(Loop OrigLoop, Loop UrolledLoop,
				uint64_t UF);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H			#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show All 26 Lines
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
▲ Show 20 Lines • Show All 628 Lines • ▼ Show 20 Lines	if (LI) {
blocks.insert(L->block_begin(), L->block_end());		blocks.insert(L->block_begin(), L->block_end());
for (BasicBlock *BB : blocks)		for (BasicBlock *BB : blocks)
LI->removeBlock(BB);		LI->removeBlock(BB);

// The last step is to update LoopInfo now that we've eliminated this loop.		// The last step is to update LoopInfo now that we've eliminated this loop.
LI->erase(L);		LI->erase(L);
}		}
}		}

Optional<unsigned> llvm::getLoopEstimatedTripCount(Loop *L) {		Optional<unsigned> llvm::getLoopEstimatedTripCount(Loop *L) {
// Support loops with an exiting latch and other existing exists only		// Support loops with an exiting latch and other existing exists only
// deoptimize.		// deoptimize.
		AyalUnsubmitted Done Reply Inline Actions Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment, added in D64553? Ayal: Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions The comment is related to get/setLoopEstimatedTripCount and still there... ebrevnov: The comment is related to get/setLoopEstimatedTripCount and still there...

// Get the branch weights for the loop's backedge.		// Get the branch weights for the loop's backedge.
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
if (!Latch)		if (!Latch)
return None;		return None;
BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());		BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());
if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))		if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))
return None;		return None;

assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|		assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|
LatchBR->getSuccessor(1) == L->getHeader()) &&		LatchBR->getSuccessor(1) == L->getHeader()) &&
"At least one edge out of the latch must go to the header");		"At least one edge out of the latch must go to the header");

SmallVector<BasicBlock *, 4> ExitBlocks;		SmallVector<BasicBlock *, 4> ExitBlocks;
L->getUniqueNonLatchExitBlocks(ExitBlocks);		L->getUniqueNonLatchExitBlocks(ExitBlocks);
if (any_of(ExitBlocks, [](const BasicBlock *EB) {		if (any_of(ExitBlocks, [](const BasicBlock *EB) {
return !EB->getTerminatingDeoptimizeCall();		return !EB->getTerminatingDeoptimizeCall();
}))		}))
return None;		return None;

// To estimate the number of times the loop body was executed, we want to		// To estimate the number of times the loop body was executed, we want to
// know the number of times the backedge was taken, vs. the number of times		// know the number of times the backedge was taken, vs. the number of times
		AyalUnsubmitted Done Reply Inline Actions dyn_cast >> cast Perhaps update above function to do here something like `BranchInst LatchBR = getExpectedExitLoopLatchBranch(L)` checking if it returned nullptr or not? Ayal:* dyn_cast >> cast Perhaps update above function to do here something like `BranchInst *LatchBR…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I was thinking about that in the first place but didn't come up with a "good" enough name. I can go with that name if you like :-) ebrevnov: I was thinking about that in the first place but didn't come up with a "good" enough name. I…
// we exited the loop.		// we exited the loop.
uint64_t TrueVal, FalseVal;		uint64_t TrueVal, FalseVal;
if (!LatchBR->extractProfMetadata(TrueVal, FalseVal))		if (!LatchBR->extractProfMetadata(TrueVal, FalseVal))
return None;		return None;

if (!TrueVal \|\| !FalseVal)		if (!TrueVal \|\| !FalseVal)
return 0;		return 0;

// Divide the count of the backedge by the count of the edge exiting the loop,		// Divide the count of the backedge by the count of the edge exiting the loop,
// rounding to nearest.		// rounding to nearest.
if (LatchBR->getSuccessor(0) == L->getHeader())		if (LatchBR->getSuccessor(0) == L->getHeader())
return (TrueVal + (FalseVal / 2)) / FalseVal;		return (TrueVal + (FalseVal / 2)) / FalseVal;
else		else
return (FalseVal + (TrueVal / 2)) / TrueVal;		return (FalseVal + (TrueVal / 2)) / TrueVal;
}		}

bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,		bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,
		AyalUnsubmitted Done Reply Inline Actions Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a separate patch, and tests. Note that it also effects loop unrolling, i.e., its effects are beyond LV. This part can be introduced either before or after the part that teaches LV to maintain profiling info. Ayal: Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Ok. ebrevnov: Ok.
ScalarEvolution &SE) {		ScalarEvolution &SE) {
Loop *OuterL = InnerLoop->getParentLoop();		Loop *OuterL = InnerLoop->getParentLoop();
if (!OuterL)		if (!OuterL)
return true;		return true;

// Get the backedge taken count for the inner loop		// Get the backedge taken count for the inner loop
BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();		BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
const SCEV *InnerLoopBECountSC = SE.getExitCount(InnerLoop, InnerLoopLatch);		const SCEV *InnerLoopBECountSC = SE.getExitCount(InnerLoop, InnerLoopLatch);
if (isa<SCEVCouldNotCompute>(InnerLoopBECountSC) \|\|		if (isa<SCEVCouldNotCompute>(InnerLoopBECountSC) \|\|
!InnerLoopBECountSC->getType()->isIntegerTy())		!InnerLoopBECountSC->getType()->isIntegerTy())
return false;		return false;

// Get whether count is invariant to the outer loop		// Get whether count is invariant to the outer loop
ScalarEvolution::LoopDisposition LD =		ScalarEvolution::LoopDisposition LD =
SE.getLoopDisposition(InnerLoopBECountSC, OuterL);		SE.getLoopDisposition(InnerLoopBECountSC, OuterL);
if (LD != ScalarEvolution::LoopInvariant)		if (LD != ScalarEvolution::LoopInvariant)
return false;		return false;

return true;		return true;
}		}

		AyalUnsubmitted Done Reply Inline Actions ditto (dyn_cast >> cast, ...) Ayal: ditto (dyn_cast >> cast, ...)
Value *llvm::createMinMaxOp(IRBuilder<> &Builder,		Value *llvm::createMinMaxOp(IRBuilder<> &Builder,
		AyalUnsubmitted Done Reply Inline Actions Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())` Ayal: Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())`
RecurrenceDescriptor::MinMaxRecurrenceKind RK,		RecurrenceDescriptor::MinMaxRecurrenceKind RK,
Value Left, Value Right) {		Value Left, Value Right) {
CmpInst::Predicate P = CmpInst::ICMP_NE;		CmpInst::Predicate P = CmpInst::ICMP_NE;
switch (RK) {		switch (RK) {
		AyalUnsubmitted Not Done Reply Inline Actions The last part could call fixupBranchWeights() if moved here from Transforms/Utils/LoopUnrollPeel.cpp Ayal: The last part could call fixupBranchWeights() if moved here from…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions That would require to change implementation for fixupBranchWeights since it disregards to update when back edge taken count is zero. ebrevnov: That would require to change implementation for fixupBranchWeights since it disregards to…
		AyalUnsubmitted Not Done Reply Inline Actions Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in `fixupBranchWeights()`, `BackedgeTakenWeight` is called `FallThroughWeight`(?) "The weight of the edge from Latch to Header", and that // FallThroughWeight is 0 means that there is no branch weights on original // latch block or estimated trip count is zero. Regarding the first meaning of 0, whoever calls `fixupBranchWeights()` should do so only if there were such weights on the original latch block, similar to the caller of `setLoopEstimatedTripCount()`. Regarding the second meaning of 0, it seem `fixupBranchWeights()` suffers from same +1 issue: estimating trip count to be zero when backedge taken weight is zero. It would be good to fix and centralize the support for updating weights of loops, but such refactoring can be done as a separate follow-up patch, after landing this (accepted) patch. Ayal: Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in…
default:		default:
llvm_unreachable("Unknown min/max recurrence kind");		llvm_unreachable("Unknown min/max recurrence kind");
case RecurrenceDescriptor::MRK_UIntMin:		case RecurrenceDescriptor::MRK_UIntMin:
P = CmpInst::ICMP_ULT;		P = CmpInst::ICMP_ULT;
break;		break;
case RecurrenceDescriptor::MRK_UIntMax:		case RecurrenceDescriptor::MRK_UIntMax:
P = CmpInst::ICMP_UGT;		P = CmpInst::ICMP_UGT;
break;		break;
▲ Show 20 Lines • Show All 270 Lines • ▼ Show 20 Lines	bool llvm::cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
unsigned BitWidth = cast<IntegerType>(S->getType())->getBitWidth();		unsigned BitWidth = cast<IntegerType>(S->getType())->getBitWidth();
APInt Max = Signed ? APInt::getSignedMaxValue(BitWidth) :		APInt Max = Signed ? APInt::getSignedMaxValue(BitWidth) :
APInt::getMaxValue(BitWidth);		APInt::getMaxValue(BitWidth);
auto Predicate = Signed ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;		auto Predicate = Signed ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;
return SE.isAvailableAtLoopEntry(S, L) &&		return SE.isAvailableAtLoopEntry(S, L) &&
SE.isLoopEntryGuardedByCond(L, Predicate, S,		SE.isLoopEntryGuardedByCond(L, Predicate, S,
SE.getConstant(Max));		SE.getConstant(Max));
}		}

		/// Update profile info for the \p OrigLoop and \p UnrolledLoop.
		AyalUnsubmitted Done Reply Inline Actions "the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop" Ayal: "the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop"
		void llvm::fixProfileInfoAfterUnrolling(Loop OrigLoop, Loop UnrolledLoop,
		uint64_t UF) {
		uint64_t OrigBackedgeTakenWeight = 0;
		uint64_t OrigLoopEntryWeight = 0;
		AyalUnsubmitted Done Reply Inline Actions Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly the weight associated with the (exit direction of the) latch branch. Ayal: Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight. Do you find OrigBackEdgeExitWeight good enough? ebrevnov: IMHO instead of trying to clarify with a comment we better find self descriptive name for such…
		AyalUnsubmitted Done Reply Inline Actions IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Definitely agree with (the preference of) finding self descriptive variable names. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight" How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. Do you find OrigBackEdgeExitWeight good enough? Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. Ayal: > IMHO instead of trying to clarify with a comment we better find self descriptive name for…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. For infinite loop entry is 1 while exit is 0 :-). I understand this is extreme we will never meet but still.... Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). That's why I used TakenCount and FallThroughCount in the very first version what perfectly matches your description. I don't feel we are getting any better names with more iterations.... Perhaps LatchCycleWeight - number of times we go to loop header from the latch and LatchExitWeight - number of times we go to loop exit from the latch? Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. I think LoopEntryExitWeight may be confusing.... I think it makes sense to use EstimatedLoopInvocationWeight in conjunction with EstimatedTripCount as parameters to get/setEstimatedTripCount interface while LatchCycleWeight and LatchExitWeight in the implementation as they are little-bit more low level. ebrevnov: > How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be…
		AyalUnsubmitted Done Reply Inline Actions May also be worthwhile asserting that UF is positive (or greater than 1?) Ayal: May also be worthwhile asserting that UF is positive (or greater than 1?)
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I think we better support 1 which could be used in some corner cases.... ebrevnov: I think we better support 1 which could be used in some corner cases....
		auto *OrigLoopLatchBranch = OrigLoop->getLoopLatch()->getTerminator();
		AyalUnsubmitted Done Reply Inline Actions are expected to be distinct Ayal: are expected to be distinct

		if (!OrigLoopLatchBranch->extractProfMetadata(OrigBackedgeTakenWeight,
		OrigLoopEntryWeight))
		return;

		MDBuilder MDB(OrigLoopLatchBranch->getContext());
		auto *UnrolledBBI = UnrolledLoop->getLoopLatch()->getTerminator();
		AyalUnsubmitted Done Reply Inline Actions UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up overflowing lines, can use Orig, Unrolled and Remainder to stand for OrigLoop, UnrolledLoop and RemainderLoop; i.e., taking "Loop" out. Ayal: UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Removed "Loop" from most names to make them a little shorter. ebrevnov: Removed "Loop" from most names to make them a little shorter.
		bool IsTrueBackEdgeOrigLoop =
		OrigLoop->contains(*succ_begin(OrigLoop->getLoopLatch()));
		bool IsTrueBackEdgeVecLoop =
		AyalUnsubmitted Done Reply Inline Actions VecLoop >> UnrolledLoop Ayal: VecLoop >> UnrolledLoop
		UnrolledLoop->contains(*succ_begin(UnrolledLoop->getLoopLatch()));

		if (!IsTrueBackEdgeOrigLoop)
		std::swap(OrigBackedgeTakenWeight, OrigLoopEntryWeight);

		if (OrigLoopEntryWeight == 0)
		return;

		// Calculate number of iterations in the original scalar loop.
		const uint64_t OrigHeaderBlockWeight =
		AyalUnsubmitted Done Reply Inline Actions Can drop the 'const', for consistency; these temporaries are obviously const's. Ayal: Can drop the 'const', for consistency; these temporaries are obviously const's.
		OrigBackedgeTakenWeight + OrigLoopEntryWeight;
		const uint64_t OrigAverageTripCount =
		OrigHeaderBlockWeight / OrigLoopEntryWeight;
		AyalUnsubmitted Done Reply Inline Actions Note that this is rounding down. Can add half of the denominator to the nominator before dividing in order to round more accurately; this is what getLoopEstimatedTripCount() does, but it seems to be off by 1 as it computes BackEdgeTakenWeight / LoopEntryWeight rounded to nearest, instead of HeaderBlockWeight / LoopEntryWeight rounded to nearest... Simply call OrigAverageTripCount = `getLoopEstimatedTripCount(OrigLoop)`? Perhaps having a `setLoopEstimatedTripCount(Loop, EstimatedTripCount, EstimatedEntryWeight)` would help fold the identical treatment of UnrolledLoop and RemainderLoop into one function, which also takes care of figuring out the True/False vs. Backedge/Exit directions? Ayal: Note that this is rounding down. Can add half of the denominator to the nominator before…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. ebrevnov: This "off by 1" stopped me from using it in the first place since that could be important in…
		AyalUnsubmitted Not Done Reply Inline Actions This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. Indeed, best fix and reuse, in a dedicated patch as raised above, thereby isolating impact on such "some cases". Ayal: > This "off by 1" stopped me from using it in the first place since that could be important in…
		// Calculate number of iterations in unrolled loop.
		uint64_t UrollAverageTripCount = OrigAverageTripCount / UF;
		AyalUnsubmitted Done Reply Inline Actions U[n]rolledAverageTripCount Ayal: U[n]rolledAverageTripCount
		// Calculate number of iterations for remainder loop.
		uint64_t RemainderAverageTripCount = OrigAverageTripCount % UF;

		// Calculate taken and fall through counts for unrolled loop.
		uint64_t UnrolledLoopBackedgeWeight = 0;
		uint64_t UnrolledLoopEntryWeight = 0;
		if (UrollAverageTripCount > 0) {
		UnrolledLoopBackedgeWeight =
		(UrollAverageTripCount - 1) * OrigLoopEntryWeight;
		UnrolledLoopEntryWeight = OrigLoopEntryWeight;
		AyalUnsubmitted Done Reply Inline Actions Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set UnrolledLoopBackedgeWeight. Ayal: Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set…
		}

		// Now calculate counters for remainder loop.
		uint64_t RemainderLoopEntryWeight = 0;
		uint64_t RemainderLoopBackedgeWeight = 0;
		if (RemainderAverageTripCount > 0) {
		RemainderLoopEntryWeight =
		(RemainderAverageTripCount - 1) * OrigLoopEntryWeight;
		RemainderLoopBackedgeWeight = OrigLoopEntryWeight;
		AyalUnsubmitted Done Reply Inline Actions ditto Ayal: ditto
		}

		// Make a swap if back edge is taken when condition "false".
		if (!IsTrueBackEdgeVecLoop)
		std::swap(UnrolledLoopBackedgeWeight, UnrolledLoopEntryWeight);
		// Set new profile metadata.
		UnrolledBBI->setMetadata(LLVMContext::MD_prof,
		MDB.createBranchWeights(UnrolledLoopBackedgeWeight,
		UnrolledLoopEntryWeight));
		// Make a swap if back edge is taken when condition "false".
		if (!IsTrueBackEdgeOrigLoop)
		std::swap(RemainderLoopEntryWeight, RemainderLoopBackedgeWeight);
		// Set new profile metadata.
		AyalUnsubmitted Done Reply Inline Actions (This actually replaces the old profile metadata with the new one.) Ayal: (This actually replaces the old profile metadata with the new one.)
		OrigLoopLatchBranch->setMetadata(
		LLVMContext::MD_prof,
		MDB.createBranchWeights(RemainderLoopEntryWeight,
		RemainderLoopBackedgeWeight));
		}

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
		#include "llvm/IR/MDBuilder.h"
		AyalUnsubmitted Done Reply Inline Actions Is this include still needed here? Ayal: Is this include still needed here?
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
▲ Show 20 Lines • Show All 3,321 Lines • ▼ Show 20 Lines	fixupIVUsers(Entry.first, Entry.second,
IVEndValues[Entry.first], LoopMiddleBlock);		IVEndValues[Entry.first], LoopMiddleBlock);

fixLCSSAPHIs();		fixLCSSAPHIs();
for (Instruction *PI : PredicatedInstructions)		for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);		sinkScalarOperands(&*PI);

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);

		AyalUnsubmitted Done Reply Inline Actions Comment below should start with a short sentence explaining that profile weights associated with the original loop are now distributed among the vector and scalar loops. Ayal: Comment below should start with a short sentence explaining that profile weights associated…
		// For cases like foldTailByMasking() and requiresScalarEpiloque() we may
		// end up getting slightly roughened result but that should be OK since
		// profile is not inherently precise anyway. Note also possible bypass of
		// vector code caused by legality checks is into account as unlikely to case.
		fixProfileInfoAfterUnrolling(LI->getLoopFor(LoopScalarBody),
		LI->getLoopFor(LoopVectorBody), VF * UF);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs() {		void InnerLoopVectorizer::fixCrossIterationPHIs() {
		AyalUnsubmitted Done Reply Inline Actions "is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the vector loop, optimistically" Ayal: "is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the…
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #2: We now need to fix the recurrences by adding incoming edges to		// stage #2: We now need to fix the recurrences by adding incoming edges to
// the currently empty PHI nodes. At this point every instruction in the		// the currently empty PHI nodes. At this point every instruction in the
// original loop is widened to a vector form so we can use them to construct		// original loop is widened to a vector form so we can use them to construct
// the incoming edges.		// the incoming edges.
for (PHINode &Phi : OrigLoop->getHeader()->phis()) {		for (PHINode &Phi : OrigLoop->getHeader()->phis()) {
// Handle first-order recurrences and reductions that need to be fixed.		// Handle first-order recurrences and reductions that need to be fixed.
▲ Show 20 Lines • Show All 490 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < NumIncomingValues; ++i) {

// Scalar incoming value may need a broadcast		// Scalar incoming value may need a broadcast
Value *NewIncV = getOrCreateVectorValue(ScIncV, 0);		Value *NewIncV = getOrCreateVectorValue(ScIncV, 0);
NewPhi->addIncoming(NewIncV, NewPredBB);		NewPhi->addIncoming(NewIncV, NewPredBB);
}		}
}		}
}		}

void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,		void InnerLoopVectorizer::widenPHIInstruction(Instruction *PN, unsigned UF,
		AyalUnsubmitted Done Reply Inline Actions "is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF" Ayal: "is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF"
unsigned VF) {		unsigned VF) {
PHINode *P = cast<PHINode>(PN);		PHINode *P = cast<PHINode>(PN);
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Nit: VFxUF - 1 DaniilSuchkov: Nit: VFxUF - 1
if (EnableVPlanNativePath) {		if (EnableVPlanNativePath) {
// Currently we enter here in the VPlan-native path for non-induction		// Currently we enter here in the VPlan-native path for non-induction
// PHIs where all control flow is uniform. We simply widen these PHIs.		// PHIs where all control flow is uniform. We simply widen these PHIs.
// Create a vector phi with no operands - the vector phi operands will be		// Create a vector phi with no operands - the vector phi operands will be
// set at the end of vector code generation.		// set at the end of vector code generation.
		AyalUnsubmitted Done Reply Inline Actions OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >> OrigLoopExitWeight? May help to also set OrigLoopEntryWeight = OrigLoopExitWeight? Ayal: OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >>…
Type *VecTy =		Type *VecTy =
(VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);		(VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
Value *VecPhi = Builder.CreatePHI(VecTy, PN->getNumOperands(), "vec.phi");		Value *VecPhi = Builder.CreatePHI(VecTy, PN->getNumOperands(), "vec.phi");
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Style: it is usually advised to turn such conditions into early exits, it would reduce required indentation and slightly improve readability. DaniilSuchkov: Style: it is usually advised to turn such conditions into early exits, it would reduce required…
		AyalUnsubmitted Done Reply Inline Actions OrigBackBranchI >> OrigLoopLatchBranch ? Ayal: OrigBackBranchI >> OrigLoopLatchBranch ?
VectorLoopValueMap.setVectorValue(P, 0, VecPhi);		VectorLoopValueMap.setVectorValue(P, 0, VecPhi);
		AyalUnsubmitted Not Done Reply Inline Actions OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count, according to the code below. Wait to test if its zero until we know what it stands for? Ayal: OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Good catch. Thanks! ebrevnov: Good catch. Thanks!
OrigPHIsToFix.push_back(P);		OrigPHIsToFix.push_back(P);

		AyalUnsubmitted Not Done Reply Inline Actions It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal). In any case, better rename "IsTrueBackEdgeLoop". Ayal:* It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and…
		ebrevnovAuthorUnsubmitted Not Done Reply Inline Actions Don't feel convinced. My point would be that extra variables and conditional reassignments make the code less readable. I think this is very subjective thing. ebrevnov: Don't feel convinced. My point would be that extra variables and conditional reassignments make…
return;		return;
}		}

assert(PN->getParent() == OrigLoop->getHeader() &&		assert(PN->getParent() == OrigLoop->getHeader() &&
"Non-header phis should have been handled elsewhere");		"Non-header phis should have been handled elsewhere");

// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #1: We create a new vector PHI node with no incoming edges. We'll use		// stage #1: We create a new vector PHI node with no incoming edges. We'll use
// this value when we vectorize all of the instructions that use the PHI.		// this value when we vectorize all of the instructions that use the PHI.
		AyalUnsubmitted Not Done Reply Inline Actions Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to call them Taken and FallThrough. Perhaps use Weight instead of Count, to denote total profile frequencies, as the latter is used elsewhere to denote the actual per-invocation TripCount. Ayal: Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I fixed names. But I don't see reasons to use different variables here (if this is what you meant) ebrevnov: I fixed names. But I don't see reasons to use different variables here (if this is what you…
if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {		if (Legal->isReductionVariable(P) \|\| Legal->isFirstOrderRecurrence(P)) {
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
// This is phase one of vectorizing PHIs.		// This is phase one of vectorizing PHIs.
		AyalUnsubmitted Not Done Reply Inline Actions bel[l]ow Ayal: bel[l]ow
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions fixed ebrevnov: fixed
Type *VecTy =		Type *VecTy =
		AyalUnsubmitted Done Reply Inline Actions Patch needs to be clang-format'ed Ayal: Patch needs to be clang-format'ed
(VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);		(VF == 1) ? PN->getType() : VectorType::get(PN->getType(), VF);
		AyalUnsubmitted Not Done Reply Inline Actions How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount = (number of times header block was executed) / (number of times header was reached from pre-header == number of times latch exited) == (OrigTakenCount + OrigFallThroughCount) / OrigFallThroughCount == OrigTakenCount / OrigFallThroughCount + 1. Ayal: How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount =…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Ok. Turned your explanation to a comment. ebrevnov: Ok. Turned your explanation to a comment.
Value *EntryPart = PHINode::Create(		Value *EntryPart = PHINode::Create(
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit surprising that OrigSomething is being changed. Same goes to OrigFallThroughCount. DaniilSuchkov: Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit…
VecTy, 2, "vec.phi", &*LoopVectorBody->getFirstInsertionPt());		VecTy, 2, "vec.phi", &*LoopVectorBody->getFirstInsertionPt());
		AyalUnsubmitted Not Done Reply Inline Actions How about VecAverageTripCount = OrigAverageTripCount / (VF * UF); Ayal: How about VecAverageTripCount = OrigAverageTripCount / (VF * UF);
VectorLoopValueMap.setVectorValue(P, Part, EntryPart);		VectorLoopValueMap.setVectorValue(P, Part, EntryPart);
}		}
return;		return;
}		}
		AyalUnsubmitted Not Done Reply Inline Actions Just to clarify, maintaining branch frequencies through optimizations is best-effort and imprecise - a total weight that does not divide VFUF implies that the trip count of at-least one invocation did not divide VFUF, not necessarily all of them; w/o considering also the distribution of trip counts in addition to their sum. Setting PRIterCount = 0 and VecAverageTipCount = round(OrigAverageTripCount / (VFUV)) when Cost->foldTailByMasking() is probably the best that can be done. The former is redundant given that it applies to dead code, and the latter should perhaps apply to all cases, in general. Ayal:* Just to clarify, maintaining branch frequencies through optimizations is best-effort and…
		AyalUnsubmitted Done Reply Inline Actions Instead of providing the explanation in a comment, seems better to implement the code this way, leaving the +1 for the compiler to optimize. I.e., const uint64_t OrigHeaderBlockWeight = OrigBackedgeTakenWeight + OrigLoopEntryWeight; const unit64_t OrigAverageTripCount = OrigHeaderBlockWeight / OrigLoopEntryWeight; Ayal: Instead of providing the explanation in a comment, seems better to implement the code this way…

setDebugLocFromInst(Builder, P);		setDebugLocFromInst(Builder, P);

// This PHINode must be an induction variable.		// This PHINode must be an induction variable.
		AyalUnsubmitted Not Done Reply Inline Actions There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VFUF for each invocation of the loop, and hence the average is also strictly positive FWIW. But best keep the approximation general instead of trying to improve it, given general lack of information. Ayal:* There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VF*UF for…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Agree. Let me remove this special case then. ebrevnov: Agree. Let me remove this special case then.
// Make sure that we know about it.		// Make sure that we know about it.
assert(Legal->getInductionVars()->count(P) && "Not an induction variable");		assert(Legal->getInductionVars()->count(P) && "Not an induction variable");

		AyalUnsubmitted Done Reply Inline Actions Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount? Ayal: Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount?
InductionDescriptor II = Legal->getInductionVars()->lookup(P);		InductionDescriptor II = Legal->getInductionVars()->lookup(P);
const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();		const DataLayout &DL = OrigLoop->getHeader()->getModule()->getDataLayout();

// FIXME: The newly created binary instructions should contain nsw/nuw flags,		// FIXME: The newly created binary instructions should contain nsw/nuw flags,
		AyalUnsubmitted Not Done Reply Inline Actions This assumes the number of times the vector loop will be reached is equal to the number of times the original scalar loop was reached (OrigFallThrougCount). This holds is Cost->foldTailByMasking(), but otherwise invocations whose trip count < VFUF will bypass the vector loop (and also == VFUF if requireScalarEpilogue()), plus other run time guards. Ayal: This assumes the number of times the vector loop will be reached is equal to the number of…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if vector loop is expected to be executed (VecIterCount > 0) ebrevnov: Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if…
		AyalUnsubmitted Done Reply Inline Actions In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight VecFallThrough >> UnrolledLoopExitWeight and/or UnrolledLoopEntryWeight Ayal: In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight…
// which can be found from the original scalar operations.		// which can be found from the original scalar operations.
switch (II.getKind()) {		switch (II.getKind()) {
		AyalUnsubmitted Not Done Reply Inline Actions How about if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight = OrigLoopEntryWeight; uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above. UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight; } leaving the -1 optimization to the compiler. Ayal: How about ``` if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight =…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions That will make computations less stable to overflow. Personally I feel the way it's written today has the same level of complexity for understanding. ebrevnov: That will make computations less stable to overflow. Personally I feel the way it's written…
case InductionDescriptor::IK_NoInduction:		case InductionDescriptor::IK_NoInduction:
llvm_unreachable("Unknown induction");		llvm_unreachable("Unknown induction");
case InductionDescriptor::IK_IntInduction:		case InductionDescriptor::IK_IntInduction:
case InductionDescriptor::IK_FpInduction:		case InductionDescriptor::IK_FpInduction:
llvm_unreachable("Integer/fp induction is handled elsewhere.");		llvm_unreachable("Integer/fp induction is handled elsewhere.");
case InductionDescriptor::IK_PtrInduction: {		case InductionDescriptor::IK_PtrInduction: {
		AyalUnsubmitted Done Reply Inline Actions PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight and/or RemainderLoopEntryWeight Ayal: PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight…
// Handle the pointer induction variable case.		// Handle the pointer induction variable case.
		AyalUnsubmitted Not Done Reply Inline Actions Similar to above comment, invocations whose trip count divides VFUF will bypass the scalar remainder loop (w/o foldTailByMasking nor requireScalarEpilogue), so in general PEFallThroughCount <= OrigFallThroughCount. Ayal:* Similar to above comment, invocations whose trip count divides VF*UF will bypass the scalar…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Same explanation as for the above. ebrevnov: Same explanation as for the above.
assert(P->getType()->isPointerTy() && "Unexpected type.");		assert(P->getType()->isPointerTy() && "Unexpected type.");
// This is the normalized GEP that starts counting at zero.		// This is the normalized GEP that starts counting at zero.
Value *PtrInd = Induction;		Value *PtrInd = Induction;
PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());		PtrInd = Builder.CreateSExtOrTrunc(PtrInd, II.getStep()->getType());
// Determine the number of scalars we need to generate for each unroll		// Determine the number of scalars we need to generate for each unroll
// iteration. If the instruction is uniform, we only need to generate the		// iteration. If the instruction is uniform, we only need to generate the
// first lane. Otherwise, we generate all VF values.		// first lane. Otherwise, we generate all VF values.
unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;		unsigned Lanes = Cost->isUniformAfterVectorization(P, VF) ? 1 : VF;
▲ Show 20 Lines • Show All 3,889 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/check-prof-info.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 -S < %s \| FileCheck %s
				; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s \| FileCheck %s -check-prefix=CHECK-MASKED

				AyalUnsubmitted Not Done Reply Inline Actions May want to also check with UF>1. Ayal: May want to also check with UF>1.
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions I replaced masked case since we don't do anything special for it now. ebrevnov: I replaced masked case since we don't do anything special for it now.
				target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
				target triple = "x86_64-unknown-linux-gnu"

				AyalUnsubmitted Not Done Reply Inline Actions Tests targeting x86 need to reside in LoopVectorize/X86 Ayal: Tests targeting x86 need to reside in LoopVectorize/X86
				@a = dso_local global [1024 x i32] zeroinitializer, align 16
				@b = dso_local global [1024 x i32] zeroinitializer, align 16

				; Check correctness of profile info for vectorization without epilog.
				; Function Attrs: nofree norecurse nounwind uwtable
				define dso_local void @_Z3foov() local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z3foov(
				; CHECK: [[VECTOR_BODY:vector\.body]]:
				; CHECK: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
				; CHECK: [[FOR_BODY:for\.body]]:
				; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
				; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
				; CHECK-MASKED: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
				; CHECK-MASKED: [[FOR_BODY:for\.body]]:
				; CHECK-MASKED: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @b, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
				%1 = trunc i64 %indvars.iv to i32
				%mul = mul nsw i32 %0, %1
				%arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx2, align 4, !tbaa !2
				%add = add nsw i32 %2, %mul
				store i32 %add, i32* %arrayidx2, align 4, !tbaa !2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !6
				}

				; Check correctness of profile info for vectorization with epilog.
				; Function Attrs: nofree norecurse nounwind uwtable
				define dso_local void @_Z3foo2v() local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z3foo2v(
				; CHECK: [[VECTOR_BODY:vector\.body]]:
				; CHECK: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
				; CHECK: [[FOR_BODY:for\.body]]:
				; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
				; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
				; CHECK-MASKED: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
				; CHECK-MASKED: [[FOR_BODY:for\.body]]:
				; CHECK-MASKED: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @b, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
				%1 = trunc i64 %indvars.iv to i32
				%mul = mul nsw i32 %0, %1
				%arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx2, align 4, !tbaa !2
				%add = add nsw i32 %2, %mul
				store i32 %add, i32* %arrayidx2, align 4, !tbaa !2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1027
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !7
				}

				attributes #0 = { "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				; CHECK: [[LP1_255]] = !{!"branch_weights", i32 1, i32 255}
				; CHECK: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
				; CHECK-MASKED: [[LP1_63]] = !{!"branch_weights", i32 1, i32 63}
				; CHECK-MASKED: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
				; CHECK: [[LP1_2]] = !{!"branch_weights", i32 1, i32 2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project c292b5b5e059e6ce3e6449e6827ef7e1037c21c4)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C++ TBAA"}
				!6 = !{!"branch_weights", i32 1, i32 1023}
				!7 = !{!"branch_weights", i32 1, i32 1026}

llvm/test/Transforms/LoopVectorize/tripcount.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for.end: ; preds = %for.body
ret i32 0		ret i32 0
}		}

define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {		define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {
; The loop has low invocation count compare to the function invocation count,		; The loop has low invocation count compare to the function invocation count,
; but has a high trip count per invocation. Vectorize it.		; but has a high trip count per invocation. Vectorize it.

; CHECK-LABEL: @foo_low_trip_count3(		; CHECK-LABEL: @foo_low_trip_count3(
; CHECK: vector.body:		; CHECK: [[VECTOR_BODY:vector\.body]]:
		; CHECK: br i1 [[TMP9:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP3:\!.*]],
		; CHECK: [[FOR_BODY:for\.body]]:
		; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP6:\!.*]],
entry:		entry:
br i1 %cond, label %for.preheader, label %for.end, !prof !2		br i1 %cond, label %for.preheader, label %for.end, !prof !2

for.preheader:		for.preheader:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]		%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body, %entry
%inc = add nsw i32 %i.08, 1		%inc = add nsw i32 %i.08, 1
%exitcond = icmp slt i32 %i.08, 1000		%exitcond = icmp slt i32 %i.08, 1000
br i1 %exitcond, label %for.body, label %for.end, !prof !1		br i1 %exitcond, label %for.body, label %for.end, !prof !1

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret i32 0		ret i32 0
}		}

		; CHECK: [[LP3]] = !{!"branch_weights", i32 10, i32 2490}
		; CHECK: [[LP6]] = !{!"branch_weights", i32 10, i32 0}
		AyalUnsubmitted Not Done Reply Inline Actions Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000, therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001. Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1 for vectorized and remainder loops, respectively, therefore their estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with loop invocation weights of 10 are the above {10, 2490} and {10, 0}. Ayal: Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I will add this text to the test. I that what you wanted (just not sure :-))? ebrevnov: I will add this text to the test. I that what you wanted (just not sure :-))?

!0 = !{!"function_entry_count", i64 100}		!0 = !{!"function_entry_count", i64 100}
!1 = !{!"branch_weights", i32 100, i32 0}		!1 = !{!"branch_weights", i32 100, i32 0}
!2 = !{!"branch_weights", i32 10, i32 90}		!2 = !{!"branch_weights", i32 10, i32 90}
!3 = !{!"branch_weights", i32 10, i32 10000}		!3 = !{!"branch_weights", i32 10, i32 10000}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorizer should adjust trip count in profile informationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 232778

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/check-prof-info.ll

llvm/test/Transforms/LoopVectorize/tripcount.ll

[LV] Vectorizer should adjust trip count in profile information
ClosedPublic