This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Transforms/Utils/
-
llvm/
-
Transforms/
-
Utils/
2/2
LoopUtils.h
-
lib/Transforms/
-
Transforms/
-
Utils/
27/30
LoopUtils.cpp
-
Vectorize/
19/34
LoopVectorize.cpp
-
test/Transforms/LoopVectorize/
-
Transforms/
-
LoopVectorize/
1/3
check-prof-info.ll
1/2
tripcount.ll

Differential D67905

[LV] Vectorizer should adjust trip count in profile information
ClosedPublic

Authored by ebrevnov on Sep 23 2019, 4:33 AM.

Download Raw Diff

Details

Reviewers

hsaito
Ayal
fhahn
reames
silvas
dcaballe
SjoerdMeijer
mkuper
DaniilSuchkov

Commits

rGaf7e1588727c: [LV] Vectorizer should adjust trip count in profile information

Summary

Vectorized loop processes VFxUF number of elements in one iteration thus total number of iterations decreases proportionally. In addition epilog loop may not have more than VFxUF - 1 iterations. This patch updates profile information accordingly.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

ebrevnov created this revision.Sep 23 2019, 4:33 AM

Herald added a project: Restricted Project. · View Herald TranscriptSep 23 2019, 4:33 AM

Herald added subscribers: llvm-commits, rkruppe, hiraditya. · View Herald Transcript

Harbormaster completed remote builds in B38417: Diff 221285.Sep 23 2019, 4:34 AM

ebrevnov added reviewers: hsaito, Ayal, fhahn, reames.Sep 23 2019, 4:36 AM

Minor test update

ping

ebrevnov added reviewers: silvas, dcaballe, SjoerdMeijer.Oct 24 2019, 4:18 AM

ebrevnov added a reviewer: mkuper.Oct 24 2019, 11:11 PM

ebrevnov added a reviewer: DaniilSuchkov.Nov 12 2019, 1:55 AM

LGTM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4055	Nit: VFxUF - 1
4061–4063	Style: it is usually advised to turn such conditions into early exits, it would reduce required indentation and slightly improve readability.
4082	Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit surprising that OrigSomething is being changed. Same goes to OrigFallThroughCount.

This revision is now accepted and ready to land.Nov 12 2019, 10:20 PM

Minor fixes as requested by reviewer.

I realized that current implementation has a flaw and and we should take into account that actual number of iterations is one greater than back edge taken count. In addition I believe that current structuring of calculations is easier for understanding.

LGTM

ebrevnov added a parent revision: D67805: [LV] Allow vectorization of hot short trip count loops with epilog.Nov 20 2019, 3:43 AM

Adding a few comments. Would be good to generalize and apply also to loop unroll (and jam).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4064	OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count, according to the code below. Wait to test if its zero until we know what it stands for?
4076	Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to call them Taken and FallThrough. Perhaps use Weight instead of Count, to denote total profile frequencies, as the latter is used elsewhere to denote the actual per-invocation TripCount.
4079	bel[l]ow
4081	How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount = (number of times header block was executed) / (number of times header was reached from pre-header == number of times latch exited) == (OrigTakenCount + OrigFallThroughCount) / OrigFallThroughCount == OrigTakenCount / OrigFallThroughCount + 1.
4083	How about VecAverageTripCount = OrigAverageTripCount / (VF * UF);
4087	Just to clarify, maintaining branch frequencies through optimizations is best-effort and imprecise - a total weight that does not divide VFUF implies that the trip count of at-least one invocation did not divide VFUF, not necessarily all of them; w/o considering also the distribution of trip counts in addition to their sum. Setting PRIterCount = 0 and VecAverageTipCount = round(OrigAverageTripCount / (VF*UV)) when Cost->foldTailByMasking() is probably the best that can be done. The former is redundant given that it applies to dead code, and the latter should perhaps apply to all cases, in general.
4091	There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VF*UF for each invocation of the loop, and hence the average is also strictly positive FWIW. But best keep the approximation general instead of trying to improve it, given general lack of information.
4098	This assumes the number of times the vector loop will be reached is equal to the number of times the original scalar loop was reached (OrigFallThrougCount). This holds is Cost->foldTailByMasking(), but otherwise invocations whose trip count < VFUF will bypass the vector loop (and also == VFUF if requireScalarEpilogue()), plus other run time guards.
4107	Similar to above comment, invocations whose trip count divides VF*UF will bypass the scalar remainder loop (w/o foldTailByMasking nor requireScalarEpilogue), so in general PEFallThroughCount <= OrigFallThroughCount.
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
4	May want to also check with UF>1.

Addressed Ayal's comments

ebrevnov marked an inline comment as done.Nov 21 2019, 4:59 AM

ebrevnov added inline comments.Nov 21 2019, 4:59 AM

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4064	Good catch. Thanks!
4076	I fixed names. But I don't see reasons to use different variables here (if this is what you meant)
4079	fixed
4081	Ok. Turned your explanation to a comment.
4091	Agree. Let me remove this special case then.
4098	Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if vector loop is expected to be executed (VecIterCount > 0)
4107	Same explanation as for the above.
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
4	I replaced masked case since we don't do anything special for it now.

Harbormaster completed remote builds in B41303: Diff 230432.Nov 21 2019, 5:02 AM

ping @Ayal

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.

Patch needs to be clang-format'ed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4053	"is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF"
4060	OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >> OrigLoopExitWeight? May help to also set OrigLoopEntryWeight = OrigLoopExitWeight?
4063	OrigBackBranchI >> OrigLoopLatchBranch ?
4066	It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal). In any case, better rename "IsTrueBackEdge*Loop".
4080	Patch needs to be clang-format'ed
4087	Instead of providing the explanation in a comment, seems better to implement the code this way, leaving the +1 for the compiler to optimize. I.e., const uint64_t OrigHeaderBlockWeight = OrigBackedgeTakenWeight + OrigLoopEntryWeight; const unit64_t OrigAverageTripCount = OrigHeaderBlockWeight / OrigLoopEntryWeight;
4094	Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount?
4098	In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight VecFallThrough >> UnrolledLoopExitWeight and/or UnrolledLoopEntryWeight
4100	How about if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight = OrigLoopEntryWeight; uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above. UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight; } leaving the -1 optimization to the compiler.
4106	PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight and/or RemainderLoopEntryWeight

In D67905#1770563, @Ayal wrote:

Still think it would be better to provide this as a standalone function in Transforms/Utils/LoopUtils, for potential benefit of loop unroll (and jam) passes in addition to LV. Having agreed to ignore foldTail and requiresScalarEpilog, there's nothing vectorization-specific to do here. There's still an issue though with the fact that LV may use the scalar loop for both the remaining TC%(VF*UF) iterations when running the vector loop, and for all TC iterations when runtime guards bypass the vector loop. In absence of information, each such guard could be assigned 0.5 probability, or one could be aggressively optimistic and hope vector loop is always reached. In any case this deserves a comment.

Suggesting further variable name changes for the three Orig, Unrolled, and Remainder Loops, each having a LoopEntry==LoopExit edge weight, a Backedge weight, a HeaderBlock weight, and an AverageTripCount. The actual weights are recorded as TrueVal and FalseVal of the latch branches.

Patch needs to be clang-format'ed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4066	Don't feel convinced. My point would be that extra variables and conditional reassignments make the code less readable. I think this is very subjective thing.
4100	That will make computations less stable to overflow. Personally I feel the way it's written today has the same level of complexity for understanding.

Addressing issues raised by Ayal.

Typo fixed.

Harbormaster completed remote builds in B42093: Diff 232778.Dec 9 2019, 2:03 AM

Harbormaster completed remote builds in B42094: Diff 232779.

fedor.sergeev added a subscriber: fedor.sergeev.Dec 9 2019, 10:17 PM

ping @Ayal

Ayal added inline comments.Dec 18 2019, 1:51 PM

llvm/include/llvm/Transforms/Utils/LoopUtils.h
374	The fact that OrigLoop is both the original loop containing the original profile weights, and acts as the RemainderLoop dedicated to leftover iterations, should be clarified. Alternatively, this utility can receive three loops: OrigLoop, UnrolledLoop and RemainderLoop, leaving it to the caller to decide if to pass OrigLoop also as RemainderLoop. Would probably be clearer to start with UnrolledLoop receiving weights that reflect TC/UF iterations, and then OrigLoop which receives weights that reflect the remaining TC%UF iterations.
376	U[n]rolledLoop, several occurrences. "\UF" >> "\p UF"
llvm/lib/Transforms/Utils/LoopUtils.cpp
1102	Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly the weight associated with the (exit direction of the) latch branch.
1110	UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up overflowing lines, can use Orig, Unrolled and Remainder to stand for OrigLoop, UnrolledLoop and RemainderLoop; i.e., taking "Loop" out.
1113	VecLoop >> UnrolledLoop
1123	Can drop the 'const', for consistency; these temporaries are obviously const's.
1126	Note that this is rounding down. Can add half of the denominator to the nominator before dividing in order to round more accurately; this is what getLoopEstimatedTripCount() does, but it seems to be off by 1 as it computes BackEdgeTakenWeight / LoopEntryWeight rounded to nearest, instead of HeaderBlockWeight / LoopEntryWeight rounded to nearest... Simply call OrigAverageTripCount = `getLoopEstimatedTripCount(OrigLoop)`? Perhaps having a `setLoopEstimatedTripCount(Loop, EstimatedTripCount, EstimatedEntryWeight)` would help fold the identical treatment of UnrolledLoop and RemainderLoop into one function, which also takes care of figuring out the True/False vs. Backedge/Exit directions?
1128	U[n]rolledAverageTripCount
1138	Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set UnrolledLoopBackedgeWeight.
1147	ditto
1160	(This actually replaces the old profile metadata with the new one.)
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
118	Is this include still needed here?
3486	Comment below should start with a short sentence explaining that profile weights associated with the original loop are now distributed among the vector and scalar loops.

ebrevnov marked 16 inline comments as done.Dec 26 2019, 3:29 AM

ebrevnov added inline comments.

llvm/lib/Transforms/Utils/LoopUtils.cpp
1102	IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight. Do you find OrigBackEdgeExitWeight good enough?
1110	Removed "Loop" from most names to make them a little shorter.
1126	This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount.

Updated as requested.

Harbormaster completed remote builds in B42951: Diff 235335.Dec 26 2019, 3:32 AM

Hi @Ayal. Thanks for you input. I fixed all places as you suggested. Please check.

Thanks for making all the changes! More comments inline.

llvm/lib/Transforms/Utils/LoopUtils.cpp
693–694	Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment, added in D64553?
730	dyn_cast >> cast Perhaps update above function to do here something like `BranchInst *LatchBR = getExpectedExitLoopLatchBranch(L)` checking if it returned nullptr or not?
742	Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a separate patch, and tests. Note that it also effects loop unrolling, i.e., its effects are beyond LV. This part can be introduced either before or after the part that teaches LV to maintain profiling info.
763	ditto (dyn_cast >> cast, ...)
764	Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())`
1098	"the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop"
1102	May also be worthwhile asserting that UF is positive (or greater than 1?)
1102	IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Definitely agree with (the preference of) finding self descriptive variable names. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight" How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. Do you find OrigBackEdgeExitWeight good enough? Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition.
1103	are expected to be distinct
1126	This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. Indeed, best fix and reuse, in a dedicated patch as raised above, thereby isolating impact on such "some cases".
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3495	"is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the vector loop, optimistically"
llvm/test/Transforms/LoopVectorize/check-prof-info.ll
7	Tests targeting x86 need to reside in LoopVectorize/X86

Ayal mentioned this in D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount..Dec 30 2019, 4:31 AM

One more round of updates.

ebrevnov edited parent revisions, added: D71990: [LoopUtils] Better accuracy for getLoopEstimatedTripCount.; removed: D67805: [LV] Allow vectorization of hot short trip count loops with epilog.Dec 30 2019, 4:33 AM

Harbormaster completed remote builds in B43034: Diff 235579.Dec 30 2019, 4:35 AM

ebrevnov added inline comments.Dec 30 2019, 11:47 PM

llvm/lib/Transforms/Utils/LoopUtils.cpp
693–694	The comment is related to get/setLoopEstimatedTripCount and still there...
730	I was thinking about that in the first place but didn't come up with a "good" enough name. I can go with that name if you like :-)
742	Ok.
1102	I think we better support 1 which could be used in some corner cases....
1102	How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. For infinite loop entry is 1 while exit is 0 :-). I understand this is extreme we will never meet but still.... Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). That's why I used TakenCount and FallThroughCount in the very first version what perfectly matches your description. I don't feel we are getting any better names with more iterations.... Perhaps LatchCycleWeight - number of times we go to loop header from the latch and LatchExitWeight - number of times we go to loop exit from the latch? Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. I think LoopEntryExitWeight may be confusing.... I think it makes sense to use EstimatedLoopInvocationWeight in conjunction with EstimatedTripCount as parameters to get/setEstimatedTripCount interface while LatchCycleWeight and LatchExitWeight in the implementation as they are little-bit more low level.

Rebase

Harbormaster completed remote builds in B43561: Diff 236970.Jan 8 2020, 11:21 PM

Rebase

Harbormaster completed remote builds in B43570: Diff 236991.Jan 9 2020, 2:26 AM

ping @Ayal

This looks good to me, thanks!

llvm/lib/Transforms/Utils/LoopUtils.cpp
779	The last part could call fixupBranchWeights() if moved here from Transforms/Utils/LoopUnrollPeel.cpp
llvm/test/Transforms/LoopVectorize/tripcount.ll
211	Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000, therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001. Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1 for vectorized and remainder loops, respectively, therefore their estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with loop invocation weights of 10 are the above {10, 2490} and {10, 0}.

ebrevnov marked 2 inline comments as done.Jan 15 2020, 7:17 PM

ebrevnov added inline comments.

llvm/lib/Transforms/Utils/LoopUtils.cpp
779	That would require to change implementation for fixupBranchWeights since it disregards to update when back edge taken count is zero.
llvm/test/Transforms/LoopVectorize/tripcount.ll
211	I will add this text to the test. I that what you wanted (just not sure :-))?

Rebase

Harbormaster completed remote builds in B44139: Diff 238455.Jan 16 2020, 4:09 AM

Ayal added inline comments.Jan 16 2020, 11:55 PM

llvm/lib/Transforms/Utils/LoopUtils.cpp
779	Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in `fixupBranchWeights()`, `BackedgeTakenWeight` is called `FallThroughWeight`(?) "The weight of the edge from Latch to Header", and that // FallThroughWeight is 0 means that there is no branch weights on original // latch block or estimated trip count is zero. Regarding the first meaning of 0, whoever calls `fixupBranchWeights()` should do so only if there were such weights on the original latch block, similar to the caller of `setLoopEstimatedTripCount()`. Regarding the second meaning of 0, it seem `fixupBranchWeights()` suffers from same +1 issue: estimating trip count to be zero when backedge taken weight is zero. It would be good to fix and centralize the support for updating weights of loops, but such refactoring can be done as a separate follow-up patch, after landing this (accepted) patch.

Closed by commit rGaf7e1588727c: [LV] Vectorizer should adjust trip count in profile information (authored by ebrevnov). · Explain WhyJan 20 2020, 3:39 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

include/

llvm/

Transforms/

Utils/

LoopUtils.h

33 lines

lib/

Transforms/

Utils/

LoopUtils.cpp

91 lines

Vectorize/

LoopVectorize.cpp

13 lines

test/

Transforms/

LoopVectorize/

check-prof-info.ll

96 lines

tripcount.ll

15 lines

Diff 239060

llvm/include/llvm/Transforms/Utils/LoopUtils.h

	Show First 20 Lines • Show All 256 Lines • ▼ Show 20 Lines
	/// @}			/// @}

	/// Set input string into loop metadata by keeping other values intact.			/// Set input string into loop metadata by keeping other values intact.
	/// If the string is already in loop metadata update value if it is			/// If the string is already in loop metadata update value if it is
	/// different.			/// different.
	void addStringMetadataToLoop(Loop TheLoop, const char MDString,			void addStringMetadataToLoop(Loop TheLoop, const char MDString,
	unsigned V = 0);			unsigned V = 0);

	/// Get a loop's estimated trip count based on branch weight metadata.			/// Returns a loop's estimated trip count based on branch weight metadata.
				/// In addition if \p EstimatedLoopInvocationWeight is not null it is
				/// initialized with weight of loop's latch leading to the exit.
	/// Returns 0 when the count is estimated to be 0, or None when a meaningful			/// Returns 0 when the count is estimated to be 0, or None when a meaningful
	/// estimate can not be made.			/// estimate can not be made.
	Optional<unsigned> getLoopEstimatedTripCount(Loop *L);			Optional<unsigned>
				getLoopEstimatedTripCount(Loop *L,
				unsigned *EstimatedLoopInvocationWeight = nullptr);

				/// Set a loop's branch weight metadata to reflect that loop has \p
				/// EstimatedTripCount iterations and \p EstimatedLoopInvocationWeight exits
				/// through latch. Returns true if metadata is successfully updated, false
				/// otherwise. Note that loop must have a latch block which controls loop exit
				/// in order to succeed.
				bool setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
				unsigned EstimatedLoopInvocationWeight);

	/// Check inner loop (L) backedge count is known to be invariant on all			/// Check inner loop (L) backedge count is known to be invariant on all
	/// iterations of its outer loop. If the loop has no parent, this is trivially			/// iterations of its outer loop. If the loop has no parent, this is trivially
	/// true.			/// true.
	bool hasIterationCountInvariantInParent(Loop *L, ScalarEvolution &SE);			bool hasIterationCountInvariantInParent(Loop *L, ScalarEvolution &SE);

	/// Helper to consistently add the set of standard passes to a loop pass's \c			/// Helper to consistently add the set of standard passes to a loop pass's \c
	/// AnalysisUsage.			/// AnalysisUsage.
	▲ Show 20 Lines • Show All 77 Lines • ▼ Show 20 Lines
	bool cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,			bool cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
	bool Signed);			bool Signed);

	/// Returns true if \p S is defined and never is equal to signed/unsigned min.			/// Returns true if \p S is defined and never is equal to signed/unsigned min.
	bool cannotBeMinInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,			bool cannotBeMinInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
	bool Signed);			bool Signed);

	enum ReplaceExitVal { NeverRepl, OnlyCheapRepl, NoHardUse, AlwaysRepl };			enum ReplaceExitVal { NeverRepl, OnlyCheapRepl, NoHardUse, AlwaysRepl };

				AyalUnsubmitted Done Reply Inline Actions The fact that OrigLoop is both the original loop containing the original profile weights, and acts as the RemainderLoop dedicated to leftover iterations, should be clarified. Alternatively, this utility can receive three loops: OrigLoop, UnrolledLoop and RemainderLoop, leaving it to the caller to decide if to pass OrigLoop also as RemainderLoop. Would probably be clearer to start with UnrolledLoop receiving weights that reflect TC/UF iterations, and then OrigLoop which receives weights that reflect the remaining TC%UF iterations. Ayal: The fact that OrigLoop is both the original loop containing the original profile weights, and…
	/// If the final value of any expressions that are recurrent in the loop can			/// If the final value of any expressions that are recurrent in the loop can
	/// be computed, substitute the exit values from the loop into any instructions			/// be computed, substitute the exit values from the loop into any instructions
				AyalUnsubmitted Done Reply Inline Actions U[n]rolledLoop, several occurrences. "\UF" >> "\p UF" Ayal: U[n]rolledLoop, several occurrences. "\UF" >> "\p UF"
	/// outside of the loop that use the final values of the current expressions.			/// outside of the loop that use the final values of the current expressions.
	/// Return the number of loop exit values that have been replaced, and the			/// Return the number of loop exit values that have been replaced, and the
	/// corresponding phi node will be added to DeadInsts.			/// corresponding phi node will be added to DeadInsts.
	int rewriteLoopExitValues(Loop L, LoopInfo LI, TargetLibraryInfo *TLI,			int rewriteLoopExitValues(Loop L, LoopInfo LI, TargetLibraryInfo *TLI,
	ScalarEvolution *SE, SCEVExpander &Rewriter,			ScalarEvolution *SE, SCEVExpander &Rewriter,
	DominatorTree *DT, ReplaceExitVal ReplaceExitValue,			DominatorTree *DT, ReplaceExitVal ReplaceExitValue,
	SmallVector<WeakTrackingVH, 16> &DeadInsts);			SmallVector<WeakTrackingVH, 16> &DeadInsts);

				/// Set weights for \p UnrolledLoop and \p RemainderLoop based on weights for
				/// \p OrigLoop and the following distribution of \p OrigLoop iteration among \p
				/// UnrolledLoop and \p RemainderLoop. \p UnrolledLoop receives weights that
				/// reflect TC/UF iterations, and \p RemainderLoop receives weights that reflect
				/// the remaining TC%UF iterations.
				///
				/// Note that \p OrigLoop may be equal to either \p UnrolledLoop or \p
				/// RemainderLoop in which case weights for \p OrigLoop are updated accordingly.
				/// Note also behavior is undefined if \p UnrolledLoop and \p RemainderLoop are
				/// equal. \p UF must be greater than zero.
				/// If \p OrigLoop has no profile info associated nothing happens.
				///
				/// This utility may be useful for such optimizations as unroller and
				/// vectorizer as it's typical transformation for them.
				void setProfileInfoAfterUnrolling(Loop OrigLoop, Loop UnrolledLoop,
				Loop *RemainderLoop, uint64_t UF);

	} // end namespace llvm			} // end namespace llvm

	#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H			#endif // LLVM_TRANSFORMS_UTILS_LOOPUTILS_H

llvm/lib/Transforms/Utils/LoopUtils.cpp

Show All 26 Lines
#include "llvm/Analysis/ScalarEvolutionExpander.h"		#include "llvm/Analysis/ScalarEvolutionExpander.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/TargetTransformInfo.h"		#include "llvm/Analysis/TargetTransformInfo.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/DIBuilder.h"		#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
		#include "llvm/IR/MDBuilder.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/PatternMatch.h"		#include "llvm/IR/PatternMatch.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/KnownBits.h"		#include "llvm/Support/KnownBits.h"
#include "llvm/Transforms/Utils/BasicBlockUtils.h"		#include "llvm/Transforms/Utils/BasicBlockUtils.h"
▲ Show 20 Lines • Show All 641 Lines • ▼ Show 20 Lines	if (LI) {
} else {		} else {
Loop::iterator I = find(LI->begin(), LI->end(), L);		Loop::iterator I = find(LI->begin(), LI->end(), L);
assert(I != LI->end() && "Couldn't find loop");		assert(I != LI->end() && "Couldn't find loop");
LI->removeLoop(I);		LI->removeLoop(I);
}		}
LI->destroy(L);		LI->destroy(L);
}		}
}		}

Optional<unsigned> llvm::getLoopEstimatedTripCount(Loop *L) {		/// Checks if \p L has single exit through latch block except possibly
		AyalUnsubmitted Done Reply Inline Actions Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment, added in D64553? Ayal: Comment what this new function is for. Rename (see below)? Retain "Support loops ..." comment…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions The comment is related to get/setLoopEstimatedTripCount and still there... ebrevnov: The comment is related to get/setLoopEstimatedTripCount and still there...
// Support loops with an exiting latch and other existing exists only		/// "deoptimizing" exits. Returns branch instruction terminating the loop
// deoptimize.		/// latch if above check is successful, nullptr otherwise.
		static BranchInst getExpectedExitLoopLatchBranch(Loop L) {
// Get the branch weights for the loop's backedge.
BasicBlock *Latch = L->getLoopLatch();		BasicBlock *Latch = L->getLoopLatch();
if (!Latch)		if (!Latch)
return None;		return nullptr;

BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());		BranchInst *LatchBR = dyn_cast<BranchInst>(Latch->getTerminator());
if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))		if (!LatchBR \|\| LatchBR->getNumSuccessors() != 2 \|\| !L->isLoopExiting(Latch))
return None;		return nullptr;

assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|		assert((LatchBR->getSuccessor(0) == L->getHeader() \|\|
LatchBR->getSuccessor(1) == L->getHeader()) &&		LatchBR->getSuccessor(1) == L->getHeader()) &&
"At least one edge out of the latch must go to the header");		"At least one edge out of the latch must go to the header");

SmallVector<BasicBlock *, 4> ExitBlocks;		SmallVector<BasicBlock *, 4> ExitBlocks;
L->getUniqueNonLatchExitBlocks(ExitBlocks);		L->getUniqueNonLatchExitBlocks(ExitBlocks);
if (any_of(ExitBlocks, [](const BasicBlock *EB) {		if (any_of(ExitBlocks, [](const BasicBlock *EB) {
return !EB->getTerminatingDeoptimizeCall();		return !EB->getTerminatingDeoptimizeCall();
}))		}))
		return nullptr;

		return LatchBR;
		}

		Optional<unsigned>
		llvm::getLoopEstimatedTripCount(Loop *L,
		unsigned *EstimatedLoopInvocationWeight) {
		// Support loops with an exiting latch and other existing exists only
		// deoptimize.
		BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
		if (!LatchBranch)
return None;		return None;

// To estimate the number of times the loop body was executed, we want to		// To estimate the number of times the loop body was executed, we want to
// know the number of times the backedge was taken, vs. the number of times		// know the number of times the backedge was taken, vs. the number of times
		AyalUnsubmitted Done Reply Inline Actions dyn_cast >> cast Perhaps update above function to do here something like `BranchInst LatchBR = getExpectedExitLoopLatchBranch(L)` checking if it returned nullptr or not? Ayal:* dyn_cast >> cast Perhaps update above function to do here something like `BranchInst *LatchBR…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I was thinking about that in the first place but didn't come up with a "good" enough name. I can go with that name if you like :-) ebrevnov: I was thinking about that in the first place but didn't come up with a "good" enough name. I…
// we exited the loop.		// we exited the loop.
uint64_t BackedgeTakenWeight, LatchExitWeight;		uint64_t BackedgeTakenWeight, LatchExitWeight;
if (!LatchBR->extractProfMetadata(BackedgeTakenWeight, LatchExitWeight))		if (!LatchBranch->extractProfMetadata(BackedgeTakenWeight, LatchExitWeight))
return None;		return None;

if (LatchBR->getSuccessor(0) != L->getHeader())		if (LatchBranch->getSuccessor(0) != L->getHeader())
std::swap(BackedgeTakenWeight, LatchExitWeight);		std::swap(BackedgeTakenWeight, LatchExitWeight);

if (!LatchExitWeight)		if (!LatchExitWeight)
return None;		return None;

		if (EstimatedLoopInvocationWeight)
		AyalUnsubmitted Done Reply Inline Actions Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a separate patch, and tests. Note that it also effects loop unrolling, i.e., its effects are beyond LV. This part can be introduced either before or after the part that teaches LV to maintain profiling info. Ayal: Thanks for taking care of this fix to improve accuracy of estimated TC! But doing so deserves a…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Ok. ebrevnov: Ok.
		*EstimatedLoopInvocationWeight = LatchExitWeight;

// Estimated backedge taken count is a ratio of the backedge taken weight by		// Estimated backedge taken count is a ratio of the backedge taken weight by
// the weight of the edge exiting the loop, rounded to nearest.		// the weight of the edge exiting the loop, rounded to nearest.
uint64_t BackedgeTakenCount =		uint64_t BackedgeTakenCount =
llvm::divideNearest(BackedgeTakenWeight, LatchExitWeight);		llvm::divideNearest(BackedgeTakenWeight, LatchExitWeight);
// Estimated trip count is one plus estimated backedge taken count.		// Estimated trip count is one plus estimated backedge taken count.
return BackedgeTakenCount + 1;		return BackedgeTakenCount + 1;
}		}

		bool llvm::setLoopEstimatedTripCount(Loop *L, unsigned EstimatedTripCount,
		unsigned EstimatedloopInvocationWeight) {
		// Support loops with an exiting latch and other existing exists only
		// deoptimize.
		BranchInst *LatchBranch = getExpectedExitLoopLatchBranch(L);
		if (!LatchBranch)
		return false;

		// Calculate taken and exit weights.
		unsigned LatchExitWeight = 0;
		unsigned BackedgeTakenWeight = 0;
		AyalUnsubmitted Done Reply Inline Actions ditto (dyn_cast >> cast, ...) Ayal: ditto (dyn_cast >> cast, ...)

		AyalUnsubmitted Done Reply Inline Actions Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())` Ayal: Better check similar to above `if (LatchBR->getSuccessor(0) != L->getHeader())`
		if (EstimatedTripCount > 0) {
		LatchExitWeight = EstimatedloopInvocationWeight;
		BackedgeTakenWeight = (EstimatedTripCount - 1) * LatchExitWeight;
		}

		// Make a swap if back edge is taken when condition is "false".
		if (LatchBranch->getSuccessor(0) != L->getHeader())
		std::swap(BackedgeTakenWeight, LatchExitWeight);

		MDBuilder MDB(LatchBranch->getContext());

		// Set/Update profile metadata.
		LatchBranch->setMetadata(
		LLVMContext::MD_prof,
		MDB.createBranchWeights(BackedgeTakenWeight, LatchExitWeight));
		AyalUnsubmitted Not Done Reply Inline Actions The last part could call fixupBranchWeights() if moved here from Transforms/Utils/LoopUnrollPeel.cpp Ayal: The last part could call fixupBranchWeights() if moved here from…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions That would require to change implementation for fixupBranchWeights since it disregards to update when back edge taken count is zero. ebrevnov: That would require to change implementation for fixupBranchWeights since it disregards to…
		AyalUnsubmitted Not Done Reply Inline Actions Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in `fixupBranchWeights()`, `BackedgeTakenWeight` is called `FallThroughWeight`(?) "The weight of the edge from Latch to Header", and that // FallThroughWeight is 0 means that there is no branch weights on original // latch block or estimated trip count is zero. Regarding the first meaning of 0, whoever calls `fixupBranchWeights()` should do so only if there were such weights on the original latch block, similar to the caller of `setLoopEstimatedTripCount()`. Regarding the second meaning of 0, it seem `fixupBranchWeights()` suffers from same +1 issue: estimating trip count to be zero when backedge taken weight is zero. It would be good to fix and centralize the support for updating weights of loops, but such refactoring can be done as a separate follow-up patch, after landing this (accepted) patch. Ayal: Agreed. Regarding disregarding to update when backedge taken weight is zero, note that in…

		return true;
		}

bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,		bool llvm::hasIterationCountInvariantInParent(Loop *InnerLoop,
ScalarEvolution &SE) {		ScalarEvolution &SE) {
Loop *OuterL = InnerLoop->getParentLoop();		Loop *OuterL = InnerLoop->getParentLoop();
if (!OuterL)		if (!OuterL)
return true;		return true;

// Get the backedge taken count for the inner loop		// Get the backedge taken count for the inner loop
BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();		BasicBlock *InnerLoopLatch = InnerLoop->getLoopLatch();
▲ Show 20 Lines • Show All 298 Lines • ▼ Show 20 Lines	bool llvm::cannotBeMaxInLoop(const SCEV S, const Loop L, ScalarEvolution &SE,
APInt Max = Signed ? APInt::getSignedMaxValue(BitWidth) :		APInt Max = Signed ? APInt::getSignedMaxValue(BitWidth) :
APInt::getMaxValue(BitWidth);		APInt::getMaxValue(BitWidth);
auto Predicate = Signed ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;		auto Predicate = Signed ? ICmpInst::ICMP_SLT : ICmpInst::ICMP_ULT;
return SE.isAvailableAtLoopEntry(S, L) &&		return SE.isAvailableAtLoopEntry(S, L) &&
SE.isLoopEntryGuardedByCond(L, Predicate, S,		SE.isLoopEntryGuardedByCond(L, Predicate, S,
SE.getConstant(Max));		SE.getConstant(Max));
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		AyalUnsubmitted Done Reply Inline Actions "the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop" Ayal: "the \p UnrolledLoop \p RemainderLoop" >> "\p UnrolledLoop and \p RemainderLoop"
// rewriteLoopExitValues - Optimize IV users outside the loop.		// rewriteLoopExitValues - Optimize IV users outside the loop.
// As a side effect, reduces the amount of IV processing within the loop.		// As a side effect, reduces the amount of IV processing within the loop.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

		AyalUnsubmitted Done Reply Inline Actions Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly the weight associated with the (exit direction of the) latch branch. Ayal: Worth commenting that OrigLoopEntryWeight also holds OrigLoopExitWeight, which is more clearly…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight. Do you find OrigBackEdgeExitWeight good enough? ebrevnov: IMHO instead of trying to clarify with a comment we better find self descriptive name for such…
		AyalUnsubmitted Done Reply Inline Actions IMHO instead of trying to clarify with a comment we better find self descriptive name for such a simple and commonly used thing. Definitely agree with (the preference of) finding self descriptive variable names. Strictly speaking OrigLoopEntryWeight != OrigLoopExitWeight" How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. Do you find OrigBackEdgeExitWeight good enough? Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. Ayal: > IMHO instead of trying to clarify with a comment we better find self descriptive name for…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be other "side/deopt" exits, but these are expected to have zero weight. E.g., when computing LoopHeaderWeight above, adding LoopEntryWeight (instead of LoopExitWeight) to BackEdgeTakenWeight seems more logical. For infinite loop entry is 1 while exit is 0 :-). I understand this is extreme we will never meet but still.... Strictly speaking, a BackEdge cannot exit: it is an edge going from a latch block to a header block, and "BackEdgeTaken" is the number of times this edge is taken=traversed, which equals the number of times its latch branch is "taken" rather than "falls-thru" (if it's true direction points to the header). That's why I used TakenCount and FallThroughCount in the very first version what perfectly matches your description. I don't feel we are getting any better names with more iterations.... Perhaps LatchCycleWeight - number of times we go to loop header from the latch and LatchExitWeight - number of times we go to loop exit from the latch? Perhaps LoopEntryExitWeight or LoopInvocationWeight could be used instead/in addition. I think LoopEntryExitWeight may be confusing.... I think it makes sense to use EstimatedLoopInvocationWeight in conjunction with EstimatedTripCount as parameters to get/setEstimatedTripCount interface while LatchCycleWeight and LatchExitWeight in the implementation as they are little-bit more low level. ebrevnov: > How so, given that OrigLoop has a single-entry, and an "expected" single-exit: there may be…
		AyalUnsubmitted Done Reply Inline Actions May also be worthwhile asserting that UF is positive (or greater than 1?) Ayal: May also be worthwhile asserting that UF is positive (or greater than 1?)
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I think we better support 1 which could be used in some corner cases.... ebrevnov: I think we better support 1 which could be used in some corner cases....
// Return true if the SCEV expansion generated by the rewriter can replace the		// Return true if the SCEV expansion generated by the rewriter can replace the
		AyalUnsubmitted Done Reply Inline Actions are expected to be distinct Ayal: are expected to be distinct
// original value. SCEV guarantees that it produces the same value, but the way		// original value. SCEV guarantees that it produces the same value, but the way
// it is produced may be illegal IR. Ideally, this function will only be		// it is produced may be illegal IR. Ideally, this function will only be
// called for verification.		// called for verification.
static bool isValidRewrite(ScalarEvolution SE, Value FromVal, Value *ToVal) {		static bool isValidRewrite(ScalarEvolution SE, Value FromVal, Value *ToVal) {
// If an SCEV expression subsumed multiple pointers, its expansion could		// If an SCEV expression subsumed multiple pointers, its expansion could
// reassociate the GEP changing the base pointer. This is illegal because the		// reassociate the GEP changing the base pointer. This is illegal because the
// final address produced by a GEP chain must be inbounds relative to its		// final address produced by a GEP chain must be inbounds relative to its
		AyalUnsubmitted Done Reply Inline Actions UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up overflowing lines, can use Orig, Unrolled and Remainder to stand for OrigLoop, UnrolledLoop and RemainderLoop; i.e., taking "Loop" out. Ayal: UnrolledBBI >> Unrolled[Loop]LatchBranch, as in OrigLoopLatchBranch. As the names end up…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Removed "Loop" from most names to make them a little shorter. ebrevnov: Removed "Loop" from most names to make them a little shorter.
// underlying object. Otherwise basic alias analysis, among other things,		// underlying object. Otherwise basic alias analysis, among other things,
// could fail in a dangerous way. Ultimately, SCEV will be improved to avoid		// could fail in a dangerous way. Ultimately, SCEV will be improved to avoid
// producing an expression involving multiple pointers. Until then, we must		// producing an expression involving multiple pointers. Until then, we must
		AyalUnsubmitted Done Reply Inline Actions VecLoop >> UnrolledLoop Ayal: VecLoop >> UnrolledLoop
// bail out here.		// bail out here.
//		//
// Retrieve the pointer operand of the GEP. Don't use GetUnderlyingObject		// Retrieve the pointer operand of the GEP. Don't use GetUnderlyingObject
// because it understands lcssa phis while SCEV does not.		// because it understands lcssa phis while SCEV does not.
Value *FromPtr = FromVal;		Value *FromPtr = FromVal;
Value *ToPtr = ToVal;		Value *ToPtr = ToVal;
if (auto *GEP = dyn_cast<GEPOperator>(FromVal))		if (auto *GEP = dyn_cast<GEPOperator>(FromVal))
FromPtr = GEP->getPointerOperand();		FromPtr = GEP->getPointerOperand();

if (auto *GEP = dyn_cast<GEPOperator>(ToVal))		if (auto *GEP = dyn_cast<GEPOperator>(ToVal))
		AyalUnsubmitted Done Reply Inline Actions Can drop the 'const', for consistency; these temporaries are obviously const's. Ayal: Can drop the 'const', for consistency; these temporaries are obviously const's.
ToPtr = GEP->getPointerOperand();		ToPtr = GEP->getPointerOperand();

if (FromPtr != FromVal \|\| ToPtr != ToVal) {		if (FromPtr != FromVal \|\| ToPtr != ToVal) {
		AyalUnsubmitted Done Reply Inline Actions Note that this is rounding down. Can add half of the denominator to the nominator before dividing in order to round more accurately; this is what getLoopEstimatedTripCount() does, but it seems to be off by 1 as it computes BackEdgeTakenWeight / LoopEntryWeight rounded to nearest, instead of HeaderBlockWeight / LoopEntryWeight rounded to nearest... Simply call OrigAverageTripCount = `getLoopEstimatedTripCount(OrigLoop)`? Perhaps having a `setLoopEstimatedTripCount(Loop, EstimatedTripCount, EstimatedEntryWeight)` would help fold the identical treatment of UnrolledLoop and RemainderLoop into one function, which also takes care of figuring out the True/False vs. Backedge/Exit directions? Ayal: Note that this is rounding down. Can add half of the denominator to the nominator before…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. ebrevnov: This "off by 1" stopped me from using it in the first place since that could be important in…
		AyalUnsubmitted Not Done Reply Inline Actions This "off by 1" stopped me from using it in the first place since that could be important in some cases. OK, let's reuse getLoopEstimatedTripCount. To be able to do that there are some changes to getLoopEstimatedTripCount. Indeed, best fix and reuse, in a dedicated patch as raised above, thereby isolating impact on such "some cases". Ayal: > This "off by 1" stopped me from using it in the first place since that could be important in…
// Quickly check the common case		// Quickly check the common case
if (FromPtr == ToPtr)		if (FromPtr == ToPtr)
		AyalUnsubmitted Done Reply Inline Actions U[n]rolledAverageTripCount Ayal: U[n]rolledAverageTripCount
return true;		return true;

// SCEV may have rewritten an expression that produces the GEP's pointer		// SCEV may have rewritten an expression that produces the GEP's pointer
// operand. That's ok as long as the pointer operand has the same base		// operand. That's ok as long as the pointer operand has the same base
// pointer. Unlike GetUnderlyingObject(), getPointerBase() will find the		// pointer. Unlike GetUnderlyingObject(), getPointerBase() will find the
// base of a recurrence. This handles the case in which SCEV expansion		// base of a recurrence. This handles the case in which SCEV expansion
// converts a pointer type recurrence into a nonrecurrent pointer base		// converts a pointer type recurrence into a nonrecurrent pointer base
// indexed by an integer recurrence.		// indexed by an integer recurrence.

// If the GEP base pointer is a vector of pointers, abort.		// If the GEP base pointer is a vector of pointers, abort.
		AyalUnsubmitted Done Reply Inline Actions Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set UnrolledLoopBackedgeWeight. Ayal: Seems slightly more logical to first set UnrolledLoopEntryWeight, and then using it set…
if (!FromPtr->getType()->isPointerTy() \|\| !ToPtr->getType()->isPointerTy())		if (!FromPtr->getType()->isPointerTy() \|\| !ToPtr->getType()->isPointerTy())
return false;		return false;

const SCEV *FromBase = SE->getPointerBase(SE->getSCEV(FromPtr));		const SCEV *FromBase = SE->getPointerBase(SE->getSCEV(FromPtr));
const SCEV *ToBase = SE->getPointerBase(SE->getSCEV(ToPtr));		const SCEV *ToBase = SE->getPointerBase(SE->getSCEV(ToPtr));
if (FromBase == ToBase)		if (FromBase == ToBase)
return true;		return true;

LLVM_DEBUG(dbgs() << "rewriteLoopExitValues: GEP rewrite bail out "		LLVM_DEBUG(dbgs() << "rewriteLoopExitValues: GEP rewrite bail out "
		AyalUnsubmitted Done Reply Inline Actions ditto Ayal: ditto
<< FromBase << " != " << ToBase << "\n");		<< FromBase << " != " << ToBase << "\n");

return false;		return false;
}		}
return true;		return true;
}		}

static bool hasHardUserWithinLoop(const Loop L, const Instruction I) {		static bool hasHardUserWithinLoop(const Loop L, const Instruction I) {
SmallPtrSet<const Instruction *, 8> Visited;		SmallPtrSet<const Instruction *, 8> Visited;
SmallVector<const Instruction *, 8> WorkList;		SmallVector<const Instruction *, 8> WorkList;
Visited.insert(I);		Visited.insert(I);
WorkList.push_back(I);		WorkList.push_back(I);
while (!WorkList.empty()) {		while (!WorkList.empty()) {
		AyalUnsubmitted Done Reply Inline Actions (This actually replaces the old profile metadata with the new one.) Ayal: (This actually replaces the old profile metadata with the new one.)
const Instruction *Curr = WorkList.pop_back_val();		const Instruction *Curr = WorkList.pop_back_val();
// This use is outside the loop, nothing to do.		// This use is outside the loop, nothing to do.
if (!L->contains(Curr))		if (!L->contains(Curr))
continue;		continue;
// Do we assume it is a "hard" use which will not be eliminated easily?		// Do we assume it is a "hard" use which will not be eliminated easily?
if (Curr->mayHaveSideEffects())		if (Curr->mayHaveSideEffects())
return true;		return true;
// Otherwise, add all its users to worklist.		// Otherwise, add all its users to worklist.
▲ Show 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	for (const RewritePhi &Phi : RewritePhiSet) {
}		}
}		}

// The insertion point instruction may have been deleted; clear it out		// The insertion point instruction may have been deleted; clear it out
// so that the rewriter doesn't trip over it later.		// so that the rewriter doesn't trip over it later.
Rewriter.clearInsertPoint();		Rewriter.clearInsertPoint();
return NumReplaced;		return NumReplaced;
}		}

		/// Set weights for \p UnrolledLoop and \p RemainderLoop based on weights for
		/// \p OrigLoop.
		void llvm::setProfileInfoAfterUnrolling(Loop OrigLoop, Loop UnrolledLoop,
		Loop *RemainderLoop, uint64_t UF) {
		assert(UF > 0 && "Zero unrolled factor is not supported");
		assert(UnrolledLoop != RemainderLoop &&
		"Unrolled and Remainder loops are expected to distinct");

		// Get number of iterations in the original scalar loop.
		unsigned OrigLoopInvocationWeight = 0;
		Optional<unsigned> OrigAverageTripCount =
		getLoopEstimatedTripCount(OrigLoop, &OrigLoopInvocationWeight);
		if (!OrigAverageTripCount)
		return;

		// Calculate number of iterations in unrolled loop.
		unsigned UnrolledAverageTripCount = *OrigAverageTripCount / UF;
		// Calculate number of iterations for remainder loop.
		unsigned RemainderAverageTripCount = *OrigAverageTripCount % UF;

		setLoopEstimatedTripCount(UnrolledLoop, UnrolledAverageTripCount,
		OrigLoopInvocationWeight);
		setLoopEstimatedTripCount(RemainderLoop, RemainderAverageTripCount,
		OrigLoopInvocationWeight);
		}

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/InstrTypes.h"		#include "llvm/IR/InstrTypes.h"
#include "llvm/IR/Instruction.h"		#include "llvm/IR/Instruction.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/IntrinsicInst.h"		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/Intrinsics.h"		#include "llvm/IR/Intrinsics.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
		AyalUnsubmitted Done Reply Inline Actions Is this include still needed here? Ayal: Is this include still needed here?
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/Operator.h"		#include "llvm/IR/Operator.h"
#include "llvm/IR/Type.h"		#include "llvm/IR/Type.h"
#include "llvm/IR/Use.h"		#include "llvm/IR/Use.h"
#include "llvm/IR/User.h"		#include "llvm/IR/User.h"
#include "llvm/IR/Value.h"		#include "llvm/IR/Value.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
▲ Show 20 Lines • Show All 3,351 Lines • ▼ Show 20 Lines	fixupIVUsers(Entry.first, Entry.second,
IVEndValues[Entry.first], LoopMiddleBlock);		IVEndValues[Entry.first], LoopMiddleBlock);

fixLCSSAPHIs();		fixLCSSAPHIs();
for (Instruction *PI : PredicatedInstructions)		for (Instruction *PI : PredicatedInstructions)
sinkScalarOperands(&*PI);		sinkScalarOperands(&*PI);

// Remove redundant induction instructions.		// Remove redundant induction instructions.
cse(LoopVectorBody);		cse(LoopVectorBody);

		AyalUnsubmitted Done Reply Inline Actions Comment below should start with a short sentence explaining that profile weights associated with the original loop are now distributed among the vector and scalar loops. Ayal: Comment below should start with a short sentence explaining that profile weights associated…
		// Set/update profile weights for the vector and remainder loops as original
		// loop iterations are now distributed among them. Note that original loop
		// represented by LoopScalarBody becomes remainder loop after vectorization.
		//
		// For cases like foldTailByMasking() and requiresScalarEpiloque() we may
		// end up getting slightly roughened result but that should be OK since
		// profile is not inherently precise anyway. Note also possible bypass of
		// vector code caused by legality checks is ignored, assigning all the weight
		// to the vector loop, optimistically.
		AyalUnsubmitted Done Reply Inline Actions "is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the vector loop, optimistically" Ayal: "is not taken into account as unlikely case" >> "is ignored, assigning all the weight to the…
		setProfileInfoAfterUnrolling(LI->getLoopFor(LoopScalarBody),
		LI->getLoopFor(LoopVectorBody),
		LI->getLoopFor(LoopScalarBody), VF * UF);
}		}

void InnerLoopVectorizer::fixCrossIterationPHIs() {		void InnerLoopVectorizer::fixCrossIterationPHIs() {
// In order to support recurrences we need to be able to vectorize Phi nodes.		// In order to support recurrences we need to be able to vectorize Phi nodes.
// Phi nodes have cycles, so we need to vectorize them in two stages. This is		// Phi nodes have cycles, so we need to vectorize them in two stages. This is
// stage #2: We now need to fix the recurrences by adding incoming edges to		// stage #2: We now need to fix the recurrences by adding incoming edges to
// the currently empty PHI nodes. At this point every instruction in the		// the currently empty PHI nodes. At this point every instruction in the
// original loop is widened to a vector form so we can use them to construct		// original loop is widened to a vector form so we can use them to construct
▲ Show 20 Lines • Show All 538 Lines • ▼ Show 20 Lines	for (unsigned i = 0; i < NumIncomingValues; ++i) {

// Scalar incoming value may need a broadcast		// Scalar incoming value may need a broadcast
Value *NewIncV = getOrCreateVectorValue(ScIncV, 0);		Value *NewIncV = getOrCreateVectorValue(ScIncV, 0);
NewPhi->addIncoming(NewIncV, NewPredBB);		NewPhi->addIncoming(NewIncV, NewPredBB);
}		}
}		}
}		}

void InnerLoopVectorizer::widenGEP(GetElementPtrInst *GEP, unsigned UF,		void InnerLoopVectorizer::widenGEP(GetElementPtrInst *GEP, unsigned UF,
		AyalUnsubmitted Done Reply Inline Actions "is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF" Ayal: "is less ... than original TC" >> "is smaller than the original TC by a factor of VFxUF"
unsigned VF, bool IsPtrLoopInvariant,		unsigned VF, bool IsPtrLoopInvariant,
SmallBitVector &IsIndexLoopInvariant) {		SmallBitVector &IsIndexLoopInvariant) {
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Nit: VFxUF - 1 DaniilSuchkov: Nit: VFxUF - 1
// Construct a vector GEP by widening the operands of the scalar GEP as		// Construct a vector GEP by widening the operands of the scalar GEP as
// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP		// necessary. We mark the vector GEP 'inbounds' if appropriate. A GEP
// results in a vector of pointers when at least one operand of the GEP		// results in a vector of pointers when at least one operand of the GEP
// is vector-typed. Thus, to keep the representation compact, we only use		// is vector-typed. Thus, to keep the representation compact, we only use
// vector-typed operands for loop-varying values.		// vector-typed operands for loop-varying values.
		AyalUnsubmitted Done Reply Inline Actions OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >> OrigLoopExitWeight? May help to also set OrigLoopEntryWeight = OrigLoopExitWeight? Ayal: OrigTakenWeight >> OrigBackedgeTakenWeight or OrigBackedgeWeight ? OrigExitWeight >>…

if (VF > 1 && IsPtrLoopInvariant && IsIndexLoopInvariant.all()) {		if (VF > 1 && IsPtrLoopInvariant && IsIndexLoopInvariant.all()) {
// If we are vectorizing, but the GEP has only loop-invariant operands,		// If we are vectorizing, but the GEP has only loop-invariant operands,
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Style: it is usually advised to turn such conditions into early exits, it would reduce required indentation and slightly improve readability. DaniilSuchkov: Style: it is usually advised to turn such conditions into early exits, it would reduce required…
		AyalUnsubmitted Done Reply Inline Actions OrigBackBranchI >> OrigLoopLatchBranch ? Ayal: OrigBackBranchI >> OrigLoopLatchBranch ?
// the GEP we build (by only using vector-typed operands for		// the GEP we build (by only using vector-typed operands for
		AyalUnsubmitted Not Done Reply Inline Actions OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count, according to the code below. Wait to test if its zero until we know what it stands for? Ayal: OrigFallThroughCount can still be either the exit count or the continue-to-next-iteration count…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Good catch. Thanks! ebrevnov: Good catch. Thanks!
// loop-varying values) would be a scalar pointer. Thus, to ensure we		// loop-varying values) would be a scalar pointer. Thus, to ensure we
// produce a vector of pointers, we need to either arbitrarily pick an		// produce a vector of pointers, we need to either arbitrarily pick an
		AyalUnsubmitted Not Done Reply Inline Actions It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and LoopExit/Entry weights according to if (OrigLoopLatchBranch->getSuccessor(0) == OrigLoop->getHeader()), following LoopUtil's getLoopEstimatedTripCount(). Analogously for createBranchWeights(TrueVal, FalseVal). In any case, better rename "IsTrueBackEdgeLoop". Ayal:* It seems clearer to call extractProfMetadata(TrueVal, FalseVal) and then set BackedgeTaken and…
		ebrevnovAuthorUnsubmitted Not Done Reply Inline Actions Don't feel convinced. My point would be that extra variables and conditional reassignments make the code less readable. I think this is very subjective thing. ebrevnov: Don't feel convinced. My point would be that extra variables and conditional reassignments make…
// operand to broadcast, or broadcast a clone of the original GEP.		// operand to broadcast, or broadcast a clone of the original GEP.
// Here, we broadcast a clone of the original.		// Here, we broadcast a clone of the original.
//		//
// TODO: If at some point we decide to scalarize instructions having		// TODO: If at some point we decide to scalarize instructions having
// loop-invariant operands, this special case will no longer be		// loop-invariant operands, this special case will no longer be
// required. We would add the scalarization decision to		// required. We would add the scalarization decision to
// collectLoopScalars() and teach getVectorValue() to broadcast		// collectLoopScalars() and teach getVectorValue() to broadcast
// the lane-zero scalar value.		// the lane-zero scalar value.
auto *Clone = Builder.Insert(GEP->clone());		auto *Clone = Builder.Insert(GEP->clone());
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
		AyalUnsubmitted Not Done Reply Inline Actions Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to call them Taken and FallThrough. Perhaps use Weight instead of Count, to denote total profile frequencies, as the latter is used elsewhere to denote the actual per-invocation TripCount. Ayal: Better use distinct names, e.g., OrigExitCount and OrigBackedgeTakenCount, than continue to…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I fixed names. But I don't see reasons to use different variables here (if this is what you meant) ebrevnov: I fixed names. But I don't see reasons to use different variables here (if this is what you…
Value *EntryPart = Builder.CreateVectorSplat(VF, Clone);		Value *EntryPart = Builder.CreateVectorSplat(VF, Clone);
VectorLoopValueMap.setVectorValue(GEP, Part, EntryPart);		VectorLoopValueMap.setVectorValue(GEP, Part, EntryPart);
addMetadata(EntryPart, GEP);		addMetadata(EntryPart, GEP);
		AyalUnsubmitted Not Done Reply Inline Actions bel[l]ow Ayal: bel[l]ow
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions fixed ebrevnov: fixed
}		}
		AyalUnsubmitted Done Reply Inline Actions Patch needs to be clang-format'ed Ayal: Patch needs to be clang-format'ed
} else {		} else {
		AyalUnsubmitted Not Done Reply Inline Actions How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount = (number of times header block was executed) / (number of times header was reached from pre-header == number of times latch exited) == (OrigTakenCount + OrigFallThroughCount) / OrigFallThroughCount == OrigTakenCount / OrigFallThroughCount + 1. Ayal: How about "OrigAverageTripCount"? Explanation about its computation: OrigAverageTripCount =…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Ok. Turned your explanation to a comment. ebrevnov: Ok. Turned your explanation to a comment.
// If the GEP has at least one loop-varying operand, we are sure to		// If the GEP has at least one loop-varying operand, we are sure to
		DaniilSuchkovUnsubmitted Not Done Reply Inline Actions Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit surprising that OrigSomething is being changed. Same goes to OrigFallThroughCount. DaniilSuchkov: Maybe introduce a new variable for this value (like EpilogueTakenCount)? Right now it's a bit…
// produce a vector of pointers. But if we are only unrolling, we want		// produce a vector of pointers. But if we are only unrolling, we want
		AyalUnsubmitted Not Done Reply Inline Actions How about VecAverageTripCount = OrigAverageTripCount / (VF * UF); Ayal: How about VecAverageTripCount = OrigAverageTripCount / (VF * UF);
// to produce a scalar GEP for each unroll part. Thus, the GEP we		// to produce a scalar GEP for each unroll part. Thus, the GEP we
// produce with the code below will be scalar (if VF == 1) or vector		// produce with the code below will be scalar (if VF == 1) or vector
// (otherwise). Note that for the unroll-only case, we still maintain		// (otherwise). Note that for the unroll-only case, we still maintain
// values in the vector mapping with initVector, as we do for other		// values in the vector mapping with initVector, as we do for other
		AyalUnsubmitted Not Done Reply Inline Actions Just to clarify, maintaining branch frequencies through optimizations is best-effort and imprecise - a total weight that does not divide VFUF implies that the trip count of at-least one invocation did not divide VFUF, not necessarily all of them; w/o considering also the distribution of trip counts in addition to their sum. Setting PRIterCount = 0 and VecAverageTipCount = round(OrigAverageTripCount / (VFUV)) when Cost->foldTailByMasking() is probably the best that can be done. The former is redundant given that it applies to dead code, and the latter should perhaps apply to all cases, in general. Ayal:* Just to clarify, maintaining branch frequencies through optimizations is best-effort and…
		AyalUnsubmitted Done Reply Inline Actions Instead of providing the explanation in a comment, seems better to implement the code this way, leaving the +1 for the compiler to optimize. I.e., const uint64_t OrigHeaderBlockWeight = OrigBackedgeTakenWeight + OrigLoopEntryWeight; const unit64_t OrigAverageTripCount = OrigHeaderBlockWeight / OrigLoopEntryWeight; Ayal: Instead of providing the explanation in a comment, seems better to implement the code this way…
// instructions.		// instructions.
for (unsigned Part = 0; Part < UF; ++Part) {		for (unsigned Part = 0; Part < UF; ++Part) {
// The pointer operand of the new GEP. If it's loop-invariant, we		// The pointer operand of the new GEP. If it's loop-invariant, we
// won't broadcast it.		// won't broadcast it.
		AyalUnsubmitted Not Done Reply Inline Actions There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VFUF for each invocation of the loop, and hence the average is also strictly positive FWIW. But best keep the approximation general instead of trying to improve it, given general lack of information. Ayal:* There's also the special case of requiresScalarEpiloque() where 0 < PEIterCount <= VF*UF for…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Agree. Let me remove this special case then. ebrevnov: Agree. Let me remove this special case then.
auto *Ptr = IsPtrLoopInvariant		auto *Ptr = IsPtrLoopInvariant
? GEP->getPointerOperand()		? GEP->getPointerOperand()
: getOrCreateVectorValue(GEP->getPointerOperand(), Part);		: getOrCreateVectorValue(GEP->getPointerOperand(), Part);
		AyalUnsubmitted Done Reply Inline Actions Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount? Ayal: Better rename/expand "PE". PEIterCount >> RemainderLoopAverageTripCount?

// Collect all the indices for the new GEP. If any index is		// Collect all the indices for the new GEP. If any index is
// loop-invariant, we won't broadcast it.		// loop-invariant, we won't broadcast it.
SmallVector<Value *, 4> Indices;		SmallVector<Value *, 4> Indices;
		AyalUnsubmitted Not Done Reply Inline Actions This assumes the number of times the vector loop will be reached is equal to the number of times the original scalar loop was reached (OrigFallThrougCount). This holds is Cost->foldTailByMasking(), but otherwise invocations whose trip count < VFUF will bypass the vector loop (and also == VFUF if requireScalarEpilogue()), plus other run time guards. Ayal: This assumes the number of times the vector loop will be reached is equal to the number of…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if vector loop is expected to be executed (VecIterCount > 0) ebrevnov: Please note that VecFallThrough is zero initially and set to OrigFallThrougCount only if…
		AyalUnsubmitted Done Reply Inline Actions In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight VecFallThrough >> UnrolledLoopExitWeight and/or UnrolledLoopEntryWeight Ayal: In the general context, "Vec" >> "UnrolledLoop". VecTakenCount >> UnrolledLoopBackedgeWeight…
for (auto Index : enumerate(GEP->indices())) {		for (auto Index : enumerate(GEP->indices())) {
Value *User = Index.value().get();		Value *User = Index.value().get();
		AyalUnsubmitted Not Done Reply Inline Actions How about if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight = OrigLoopEntryWeight; uint64_t UnrolledLoopHeaderWeight = UnrolledLoopAverageTripCount * UnrolledLoopEntryWeight; // Analogous to computing OrigLoopAverageTripCount from Header and Entry weights above. UnrolledLoopBackedgeWeight = UnrolledLoopHeaderWeight - UnrolledLoopEntryWeight; } leaving the -1 optimization to the compiler. Ayal: How about ``` if (UnrolledLoopAverageTripCount > 0) { UnrolledLoopEntryWeight =…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions That will make computations less stable to overflow. Personally I feel the way it's written today has the same level of complexity for understanding. ebrevnov: That will make computations less stable to overflow. Personally I feel the way it's written…
if (IsIndexLoopInvariant[Index.index()])		if (IsIndexLoopInvariant[Index.index()])
Indices.push_back(User);		Indices.push_back(User);
else		else
Indices.push_back(getOrCreateVectorValue(User, Part));		Indices.push_back(getOrCreateVectorValue(User, Part));
}		}

		AyalUnsubmitted Done Reply Inline Actions PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight and/or RemainderLoopEntryWeight Ayal: PETakenCount >> RemainderLoopBackedgeWeight PEFallThroughCount >> RemainderLoopExitWeight…
// Create the new GEP. Note that this GEP may be a scalar if VF == 1,		// Create the new GEP. Note that this GEP may be a scalar if VF == 1,
		AyalUnsubmitted Not Done Reply Inline Actions Similar to above comment, invocations whose trip count divides VFUF will bypass the scalar remainder loop (w/o foldTailByMasking nor requireScalarEpilogue), so in general PEFallThroughCount <= OrigFallThroughCount. Ayal:* Similar to above comment, invocations whose trip count divides VF*UF will bypass the scalar…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions Same explanation as for the above. ebrevnov: Same explanation as for the above.
// but it should be a vector, otherwise.		// but it should be a vector, otherwise.
auto *NewGEP =		auto *NewGEP =
GEP->isInBounds()		GEP->isInBounds()
? Builder.CreateInBoundsGEP(GEP->getSourceElementType(), Ptr,		? Builder.CreateInBoundsGEP(GEP->getSourceElementType(), Ptr,
Indices)		Indices)
: Builder.CreateGEP(GEP->getSourceElementType(), Ptr, Indices);		: Builder.CreateGEP(GEP->getSourceElementType(), Ptr, Indices);
assert((VF == 1 \|\| NewGEP->getType()->isVectorTy()) &&		assert((VF == 1 \|\| NewGEP->getType()->isVectorTy()) &&
"NewGEP is not a pointer vector");		"NewGEP is not a pointer vector");
▲ Show 20 Lines • Show All 3,904 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/check-prof-info.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=1 -S < %s \| FileCheck %s
				; RUN: opt -passes="print<block-freq>,loop-vectorize" -force-vector-width=4 -force-vector-interleave=4 -S < %s \| FileCheck %s -check-prefix=CHECK-MASKED

				AyalUnsubmitted Not Done Reply Inline Actions May want to also check with UF>1. Ayal: May want to also check with UF>1.
				ebrevnovAuthorUnsubmitted Done Reply Inline Actions I replaced masked case since we don't do anything special for it now. ebrevnov: I replaced masked case since we don't do anything special for it now.
				target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

				@a = dso_local global [1024 x i32] zeroinitializer, align 16
				AyalUnsubmitted Not Done Reply Inline Actions Tests targeting x86 need to reside in LoopVectorize/X86 Ayal: Tests targeting x86 need to reside in LoopVectorize/X86
				@b = dso_local global [1024 x i32] zeroinitializer, align 16

				; Check correctness of profile info for vectorization without epilog.
				; Function Attrs: nofree norecurse nounwind uwtable
				define dso_local void @_Z3foov() local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z3foov(
				; CHECK: [[VECTOR_BODY:vector\.body]]:
				; CHECK: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
				; CHECK: [[FOR_BODY:for\.body]]:
				; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
				; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
				; CHECK-MASKED: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
				; CHECK-MASKED: [[FOR_BODY:for\.body]]:
				; CHECK-MASKED: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP0_0:\!.*]],
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @b, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
				%1 = trunc i64 %indvars.iv to i32
				%mul = mul nsw i32 %0, %1
				%arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx2, align 4, !tbaa !2
				%add = add nsw i32 %2, %mul
				store i32 %add, i32* %arrayidx2, align 4, !tbaa !2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1024
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !6
				}

				; Check correctness of profile info for vectorization with epilog.
				; Function Attrs: nofree norecurse nounwind uwtable
				define dso_local void @_Z3foo2v() local_unnamed_addr #0 {
				; CHECK-LABEL: @_Z3foo2v(
				; CHECK: [[VECTOR_BODY:vector\.body]]:
				; CHECK: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_255:\!.*]],
				; CHECK: [[FOR_BODY:for\.body]]:
				; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
				; CHECK-MASKED: [[VECTOR_BODY:vector\.body]]:
				; CHECK-MASKED: br i1 [[TMP:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP1_63:\!.*]],
				; CHECK-MASKED: [[FOR_BODY:for\.body]]:
				; CHECK-MASKED: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP1_2:\!.*]],
				;
				entry:
				br label %for.body

				for.cond.cleanup: ; preds = %for.body
				ret void

				for.body: ; preds = %for.body, %entry
				%indvars.iv = phi i64 [ 0, %entry ], [ %indvars.iv.next, %for.body ]
				%arrayidx = getelementptr inbounds [1024 x i32], [1024 x i32]* @b, i64 0, i64 %indvars.iv
				%0 = load i32, i32* %arrayidx, align 4, !tbaa !2
				%1 = trunc i64 %indvars.iv to i32
				%mul = mul nsw i32 %0, %1
				%arrayidx2 = getelementptr inbounds [1024 x i32], [1024 x i32]* @a, i64 0, i64 %indvars.iv
				%2 = load i32, i32* %arrayidx2, align 4, !tbaa !2
				%add = add nsw i32 %2, %mul
				store i32 %add, i32* %arrayidx2, align 4, !tbaa !2
				%indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
				%exitcond = icmp eq i64 %indvars.iv.next, 1027
				br i1 %exitcond, label %for.cond.cleanup, label %for.body, !prof !7
				}

				attributes #0 = { "use-soft-float"="false" }

				!llvm.module.flags = !{!0}
				!llvm.ident = !{!1}

				; CHECK: [[LP1_255]] = !{!"branch_weights", i32 1, i32 255}
				; CHECK: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
				; CHECK-MASKED: [[LP1_63]] = !{!"branch_weights", i32 1, i32 63}
				; CHECK-MASKED: [[LP0_0]] = !{!"branch_weights", i32 0, i32 0}
				; CHECK: [[LP1_2]] = !{!"branch_weights", i32 1, i32 2}

				!0 = !{i32 1, !"wchar_size", i32 4}
				!1 = !{!"clang version 10.0.0 (https://github.com/llvm/llvm-project c292b5b5e059e6ce3e6449e6827ef7e1037c21c4)"}
				!2 = !{!3, !3, i64 0}
				!3 = !{!"int", !4, i64 0}
				!4 = !{!"omnipotent char", !5, i64 0}
				!5 = !{!"Simple C++ TBAA"}
				!6 = !{!"branch_weights", i32 1, i32 1023}
				!7 = !{!"branch_weights", i32 1, i32 1026}

llvm/test/Transforms/LoopVectorize/tripcount.ll

Show First 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	for.end: ; preds = %for.body
ret i32 0		ret i32 0
}		}

define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {		define i32 @foo_low_trip_count3(i1 %cond, i32 %bound) !prof !0 {
; The loop has low invocation count compare to the function invocation count,		; The loop has low invocation count compare to the function invocation count,
; but has a high trip count per invocation. Vectorize it.		; but has a high trip count per invocation. Vectorize it.

; CHECK-LABEL: @foo_low_trip_count3(		; CHECK-LABEL: @foo_low_trip_count3(
; CHECK: vector.body:		; CHECK: [[VECTOR_BODY:vector\.body]]:
		; CHECK: br i1 [[TMP9:%.]], label [[MIDDLE_BLOCK:%.]], label %[[VECTOR_BODY]], !prof [[LP3:\!.*]],
		; CHECK: [[FOR_BODY:for\.body]]:
		; CHECK: br i1 [[EXITCOND:%.]], label [[FOR_END_LOOPEXIT:%.]], label %[[FOR_BODY]], !prof [[LP6:\!.*]],
entry:		entry:
br i1 %cond, label %for.preheader, label %for.end, !prof !2		br i1 %cond, label %for.preheader, label %for.end, !prof !2

for.preheader:		for.preheader:
br label %for.body		br label %for.body

for.body: ; preds = %for.body, %entry		for.body: ; preds = %for.body, %entry
%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]		%i.08 = phi i32 [ 0, %for.preheader ], [ %inc, %for.body ]
▲ Show 20 Lines • Show All 126 Lines • ▼ Show 20 Lines	for.body: ; preds = %for.body, %entry
%inc = add nsw i32 %i.08, 1		%inc = add nsw i32 %i.08, 1
%exitcond = icmp slt i32 %i.08, 1000		%exitcond = icmp slt i32 %i.08, 1000
br i1 %exitcond, label %for.body, label %for.end, !prof !1		br i1 %exitcond, label %for.body, label %for.end, !prof !1

for.end: ; preds = %for.body		for.end: ; preds = %for.body
ret i32 0		ret i32 0
}		}

		; CHECK: [[LP3]] = !{!"branch_weights", i32 10, i32 2490}
		; CHECK: [[LP6]] = !{!"branch_weights", i32 10, i32 0}
		AyalUnsubmitted Not Done Reply Inline Actions Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000, therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001. Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1 for vectorized and remainder loops, respectively, therefore their estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with loop invocation weights of 10 are the above {10, 2490} and {10, 0}. Ayal: Following this, to clarify: original loop has latchExitWeight=10 and backedgeTakenWeight=10,000…
		ebrevnovAuthorUnsubmitted Done Reply Inline Actions I will add this text to the test. I that what you wanted (just not sure :-))? ebrevnov: I will add this text to the test. I that what you wanted (just not sure :-))?
		; original loop has latchExitWeight=10 and backedgeTakenWeight=10,000,
		; therefore estimatedBackedgeTakenCount=1,000 and estimatedTripCount=1,001.
		; Vectorizing by 4 produces estimatedTripCounts of 1,001/4=250 and 1,001%4=1
		; for vectorized and remainder loops, respectively, therefore their
		; estimatedBackedgeTakenCounts are 249 and 0, and so the weights recorded with
		; loop invocation weights of 10 are the above {10, 2490} and {10, 0}.

!0 = !{!"function_entry_count", i64 100}		!0 = !{!"function_entry_count", i64 100}
!1 = !{!"branch_weights", i32 100, i32 0}		!1 = !{!"branch_weights", i32 100, i32 0}
!2 = !{!"branch_weights", i32 10, i32 90}		!2 = !{!"branch_weights", i32 10, i32 90}
!3 = !{!"branch_weights", i32 10, i32 10000}		!3 = !{!"branch_weights", i32 10, i32 10000}

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Vectorizer should adjust trip count in profile informationClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 239060

llvm/include/llvm/Transforms/Utils/LoopUtils.h

llvm/lib/Transforms/Utils/LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

llvm/test/Transforms/LoopVectorize/check-prof-info.ll

llvm/test/Transforms/LoopVectorize/tripcount.ll

[LV] Vectorizer should adjust trip count in profile information
ClosedPublic