This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
4
LoopInfo.h
-
lib/
-
Analysis/
2/20
LoopInfo.cpp
-
Transforms/Vectorize/
-
Vectorize/
2
LoopVectorizationLegality.cpp
-
test/Transforms/LoopVectorize/ARM/
-
Transforms/
-
LoopVectorize/
-
ARM/
-
tail-folding-counting-down.ll

Differential D76838

[LV][LoopInfo] Transform counting-down loops to counting-up loop
AbandonedPublic

Authored by SjoerdMeijer on Mar 26 2020, 5:01 AM.

Download Raw Diff

Details

Reviewers

Ayal
fhahn
samparker
hsaito
dmgreen

Summary

This adds a transformation to LoopInfo to reverse the induction variable if it is counting downwards. This canonicalisation of the loop is used by the vectoriser to discover earlier the primary induction variable. A minimal example is this:

void f (char *a, char *b, char *c, int N) {
  while (N-- > 0)
    *c++ = *a++ + *b++;
}

This will (of course) be vectorised, but when tail-folding is requested quite early in the vectorisation pipeline, it hasn't discovered the primary induction variable, inhibiting tail-folding for counting down loops which needs a primary IV for masking the loads/stores. By an early rewrite of this loop to a counting-up loop, we enable this tail-folding.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Mar 26 2020, 5:01 AM

Herald added subscribers: danielkiss, rkruppe, hiraditya, kristof.beyls. · View Herald TranscriptMar 26 2020, 5:01 AM

SjoerdMeijer mentioned this in D76686: [LV] widenIntOrFpInduction. NFC..Mar 26 2020, 5:04 AM

samparker added inline comments.Mar 26 2020, 6:53 AM

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
523	why aren't these lambdas just a bools?
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1828 ↗	(On Diff #252802)	same here.

It would indeed be good to have FoldTail handle loops that currently lack a Primary Induction Variable (0, +1). Such an %iv is needed in order to build the "%iv < %TripCount" comparison for masking.
In order to handle loops with reversed (%TripCount, -1) induction variables %riv instead of a PIV, this comparison should use %iv = %TripCount - %riv, or simply compare "%riv > 0" instead.
Note that LV eventually always builds a new PIV to control the vector loop, so best use it instead of an %riv; but this PIV is not (yet) represented in VPlan directly, i.e., in the absence of an original PIV.
There are however non-primary cases other than reversed whose tail would also be good to fold, e.g., when the start index is non-zero or when SCEV can somehow determine the trip count as in PR40816.

So it would be better to provide a more general solution, such as (1) having LV build this comparison using a new PIV if needed, which requires additional VPlan support; (2) canonicalize loops to have a PIV before they reach LV, based on (Predicated?) SCEV, possibly by extending/rerunning loop-simplify or indvars as suggested in https://reviews.llvm.org/D68577#1742745.
Note that (1) follows the general long-term direction of modelling the entire vector loop in VPlan.
Note that (2) may help LV in additional aspects, e.g., by clearing the discrepancy that last reverted D68577.

Sounds reasonable to go for (2) at this time?

Thanks for commenting!

Sounds reasonable to go for (2) at this time?

Yep, completely agreed. This was actually my initial approach, but now I can't remember why I gave up on it, possibly because loop-simplify wasn't doing what I wanted and/or I got the impression it could just be a small fix in the LV. But that's not an excuse, so will indeed go for the loop canonical form approach.

I will keep this ticket open and rebrand/reuse it when I have results with loop-simplify/indvars.

Rewrite using new function reverseInductionVariable in LoopInfo, see also the updated Title/Summary.

samparker added inline comments.Apr 1 2020, 1:27 AM

llvm/lib/Analysis/LoopInfo.cpp
241	So, is this the canonical form? We won't see ICMP_NE?
245	isa<> or use dyn_cast in the line above.
253	Same query about ICMP_SGT 1 and ICMP_NE 0.
277	Have we checked that the original indvar only has one use?
281	I guess you could just create a cmp and replace the users, so we won't have to delete and create another branch.
290	I think just using setIncoming will be fine.

SjoerdMeijer marked an inline comment as done.Apr 1 2020, 3:09 AM

SjoerdMeijer added inline comments.

llvm/lib/Analysis/LoopInfo.cpp
241	More canonical is ICMP_EQ, which is what you'll get when you e.g. have `while (N--)` instead of the case `while (N-- > 0)` supported here. But at the same time time, I think we can have ICMP_NE too. I considered supporting ICMP_SGT first as the minimal viable product, also to check how you reviewers and testing appreciate this change and approach, then follow-up to support some more predicates :-)
277	Don't think so, thanks, will do.
281	Because of the EQ predicate, I have the TRUE and FALSE block operands swapped compared to the original branch.
290	That's what I wanted, but if I haven't overlooked anything, that requires an index, which means I need to do find the index first, then do setIncoming, and then I might as well remove it and add the updated one.

Ayal added inline comments.Apr 1 2020, 4:25 AM

llvm/include/llvm/Analysis/LoopInfo.h
580	Following the above comment, this Analysis should rely on a previous Transformation to produce a canonical induction variable, if needed. If this transformation is applied to a loop before deciding to vectorize it, there may be potential slowdowns when the loop remains unvectorized; so best handled independent of LV. In terms of implementation, as far as LV is concerned, if getCanonicalInductionVariable() fails and one is to be constructed, do so by relying on SCEV::getBackedgeTakenCount() instead of pattern matching. Cf. http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html

SjoerdMeijer added inline comments.Apr 1 2020, 8:36 AM

llvm/include/llvm/Analysis/LoopInfo.h
580	Sorry, but I just want to be sure, where does the transformation need to be implemented? Is that Indvar simplify, in SCEV, or a looputil function? I may have also read a few things differently than I do know. For example, I thought I understood that it was undesired to do this in IndVarSimplify from the mail on the dev list. And also regarding that mail http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html, and perhaps I should write this to the list, but I don't think I understand "As a consequence, any loop structure that is recognized by SCEV will (/should) not profit from rewriting."

Ayal added inline comments.Apr 1 2020, 9:08 AM

llvm/include/llvm/Analysis/LoopInfo.h
580	If the transformation is to be applied to all loops, vectorized or not, it should be part of Indvar, as it once was. The argument on the dev list is that (a) it has and may still cause slowdowns with small if any profit, and (b) passes should use SCEV instead of relying on specific forms of IR patterns. Tried to argue against both in http://lists.llvm.org/pipermail/llvm-dev/2020-April/140539.html. Perhaps only a limited form of the old iv-rewrite should be re-enabled, e.g., canonicalizing the primary iv only, in loops that "appear" vectorizable.

samparker added inline comments.Apr 2 2020, 1:31 AM

llvm/include/llvm/Analysis/LoopInfo.h
580	Considering that this change is just to enable a specific part of the vectorizer to do its thing, I'm not convinced that extracting it out is the way to go, especially when it would cause many more (unnecessary) changes.

Meinersbur added a subscriber: Meinersbur.Apr 2 2020, 10:19 AM

Meinersbur added inline comments.

llvm/lib/Analysis/LoopInfo.cpp
220	`Loop` is part of the `LLVMAnalysis` library. Transformations should be in the `LLVMTransform` library. Also, it is untypical for the analysis results to have methods for modification as well. These are typically found in the LLVMTransform library such as `LoopUtils.cpp`
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1192	`LoopVectorizationLegality::canVectorize` is not really a place for changing the IR. It's also a speculative transformation: The IR will have changed even if the loop at the end will not be vectorized (e.g. because it's not profitable).

SjoerdMeijer added inline comments.Apr 2 2020, 12:17 PM

llvm/lib/Analysis/LoopInfo.cpp
220	Thanks for the suggestion, I'm happy to move this to LoopUtils.cpp. Also checking with @Ayal, is this something we can all live with?

Ayal added inline comments.Apr 2 2020, 5:21 PM

llvm/lib/Analysis/LoopInfo.cpp
220	Constructing a canonical induction if one is missing and needed for LV's FoldTail, using SCEV::getBackedgeTakenCount() instead of pattern matching, could be placed in LoopUtils.cpp (or in LoopVectorize.cpp, using ILV::createInductionVariable()). The concern is that doing so might cause slowdowns if the loop is not vectorized, something LV has been trying hard to avoid, so far successfully, which also motivated VPlan. (BTW, doing so may also cause slowdowns even if the loop is vectorized, but that's a different story ;-). Reverting control to the original IV if the loop is not vectorized seems awkward at best. Let's try to think of alternative (1), i.e., of having LV represent in VPlan the new PIV that it will eventually create. A new PrimaryInductionRecipe (or VPInstruction) can be introduced and placed at the beginning of the first VPBasicBlock; its execute() will create a Phi-starting-at-zero, set ILV::Induction and possibly a PIV VPValue to it; the bump and compare could be taken care of in ILV.fixVectorizedLoop(). Interested in following this approach?

samparker added inline comments.Apr 3 2020, 12:50 AM

llvm/lib/Analysis/LoopInfo.cpp
290	Ah yes, sorry, its called setIncomingValueForBlock.

SjoerdMeijer added inline comments.Apr 3 2020, 3:45 AM

llvm/lib/Analysis/LoopInfo.cpp
220	Thanks for the suggestion Ayal. My initial thought was that this sounds like a lot of different moving parts for a simple thing like this. But if it overcomes the problem of doing this transformation unnecessary, then that sounds like a good plan. And also, creating a vplan recipe migt not be that bad, i.e. actually a good fit for this. I need to get up to speed with how the vplan machinery works, which is what I am doing first at the moment.

fhahn added inline comments.Apr 3 2020, 4:34 AM

llvm/lib/Analysis/LoopInfo.cpp
220	IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code. I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time? At the moment, the def/use chains are not yet modeled in VPlan completely, so introducing a new recipe would not add a lot of value (I might be missing something though), as we cannot update the users in the VPlan to use the reversed IV. Unless I am missing something, the issue could be addressed easiest by recording the information in LVL and reverse the induction variable using the utility as a codegen preparation step before executing the VPlan. Once the def/use chain is migrated to VPlan, it should be straight-forward to handle the case in VPlan.

Ayal added inline comments.Apr 3 2020, 5:05 AM

llvm/lib/Analysis/LoopInfo.cpp

220

When building VPlan with FoldMask, a VPValue *IV of the Primary Induction Variable is needed by

// Introduce the early-exit compare IV <= BTC to form header block mask.
// This is used instead of IV < TC because TC may wrap, unlike BTC.
VPValue *IV = Plan->getVPValue(Legal->getPrimaryInduction());
VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});

Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe.

SjoerdMeijer marked an inline comment as done.Apr 3 2020, 5:09 AM

SjoerdMeijer added inline comments.

llvm/lib/Analysis/LoopInfo.cpp
220	Hi Florian, IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code. Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations. I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time? Now, it's my turn to double-check something :), do you mean like it is effectively done here in this patch? I mean, some details might need some work, code might need to be moved around a bit, but is this not essentially what this patch is doing? Or did you have something fundamentally different in mind?

fhahn added inline comments.Apr 6 2020, 12:42 PM

llvm/lib/Analysis/LoopInfo.cpp
220	When building VPlan with FoldMask, a VPValue IV of the Primary Induction Variable is needed by // Introduce the early-exit compare IV <= BTC to form header block mask. // This is used instead of IV < TC because TC may wrap, unlike BTC. VPValue IV = Plan->getVPValue(Legal->getPrimaryInduction()); VPValue BTC = Plan->getOrCreateBackedgeTakenCount(); BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC}); Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe. Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (`%vp.primaryiv`) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen. To use the re-written induction variable, you could just use something like `State.PrimaryIV = OrigLoop->reverseInductionVariable(ILV.PSE.getSE());` Currently your `reverseInductionVariable` only rewrites the induction variable and related checks, but leaves other users of the IV untouched. I guess that's definitely fine if there are no other users (as in your test cases). Given that I think you also might be able to do the re-write in-place (updating the existing add/icmp/sub instructions instead of creating new ones). Then there would be no need to update the existing VPlan at the moment I think. If there are other users, it would mean that we also might change the order of the some operations in the loop (e.g. the order in which memory locations are accessed). This could be avoided by rewriting the uses of the IV with a new expression. Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations. I think we don't want to reverse the IV before deciding to vectorize.

Ayal mentioned this in D77635: [LV] Vectorize with FoldTail when Primary Induction is absent.Apr 7 2020, 2:27 AM

Ayal added inline comments.Apr 7 2020, 3:30 AM

llvm/lib/Analysis/LoopInfo.cpp
220	Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (%vp.primaryiv) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen. VPRecipeBuilder::createBlockInMask() is part of VPlan construction rather than codegen, so having it call Legal should be fine. D77635 which follows approach (1) above continues to rely on an original primary induction (start=0, step=1) and its widening, when available, and otherwise widens the new induction created during codegen to control the vector loop (start=0, step=VF*UF).

SjoerdMeijer mentioned this in rG9633fc14aef7: [LV][ARM] Add tail-folding tests for MVE. NFC..Apr 14 2020, 8:33 AM

Implemented with a vplan transformation in D77635, abandoning this change.

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

LoopInfo.h

5 lines

lib/

Analysis/

LoopInfo.cpp

99 lines

Transforms/

Vectorize/

LoopVectorizationLegality.cpp

12 lines

test/

Transforms/

LoopVectorize/

ARM/

tail-folding-counting-down.ll

407 lines

Diff 253896

llvm/include/llvm/Analysis/LoopInfo.h

Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines	public:
///		///
/// If InsertPt is specified, it is the point to hoist instructions to.		/// If InsertPt is specified, it is the point to hoist instructions to.
/// If null, the terminator of the loop preheader is used.		/// If null, the terminator of the loop preheader is used.
///		///
bool makeLoopInvariant(Instruction *I, bool &Changed,		bool makeLoopInvariant(Instruction *I, bool &Changed,
Instruction *InsertPt = nullptr,		Instruction *InsertPt = nullptr,
MemorySSAUpdater *MSSAU = nullptr) const;		MemorySSAUpdater *MSSAU = nullptr) const;

		/// If this loop has an induction variable with a negative constant induction
		/// step value 1, then reverse the induction variable and return this, thus
		/// transforming a counting-down loop to a counting-up loop.
		PHINode *reverseInductionVariable(ScalarEvolution &SE);

/// Check to see if the loop has a canonical induction variable: an integer		/// Check to see if the loop has a canonical induction variable: an integer
/// recurrence that starts at 0 and increments by one each time through the		/// recurrence that starts at 0 and increments by one each time through the
/// loop. If so, return the phi node that corresponds to it.		/// loop. If so, return the phi node that corresponds to it.
///		///
/// The IndVarSimplify pass transforms loops to have a canonical induction		/// The IndVarSimplify pass transforms loops to have a canonical induction
/// variable.		/// variable.
		AyalUnsubmitted Not Done Reply Inline Actions Following the above comment, this Analysis should rely on a previous Transformation to produce a canonical induction variable, if needed. If this transformation is applied to a loop before deciding to vectorize it, there may be potential slowdowns when the loop remains unvectorized; so best handled independent of LV. In terms of implementation, as far as LV is concerned, if getCanonicalInductionVariable() fails and one is to be constructed, do so by relying on SCEV::getBackedgeTakenCount() instead of pattern matching. Cf. http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html Ayal: Following the above comment, this Analysis should rely on a previous Transformation to produce…
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions Sorry, but I just want to be sure, where does the transformation need to be implemented? Is that Indvar simplify, in SCEV, or a looputil function? I may have also read a few things differently than I do know. For example, I thought I understood that it was undesired to do this in IndVarSimplify from the mail on the dev list. And also regarding that mail http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html, and perhaps I should write this to the list, but I don't think I understand "As a consequence, any loop structure that is recognized by SCEV will (/should) not profit from rewriting." SjoerdMeijer: Sorry, but I just want to be sure, where does the transformation need to be implemented? Is…
		AyalUnsubmitted Not Done Reply Inline Actions If the transformation is to be applied to all loops, vectorized or not, it should be part of Indvar, as it once was. The argument on the dev list is that (a) it has and may still cause slowdowns with small if any profit, and (b) passes should use SCEV instead of relying on specific forms of IR patterns. Tried to argue against both in http://lists.llvm.org/pipermail/llvm-dev/2020-April/140539.html. Perhaps only a limited form of the old iv-rewrite should be re-enabled, e.g., canonicalizing the primary iv only, in loops that "appear" vectorizable. Ayal: If the transformation is to be applied to all loops, vectorized or not, it should be part of…
		samparkerUnsubmitted Not Done Reply Inline Actions Considering that this change is just to enable a specific part of the vectorizer to do its thing, I'm not convinced that extracting it out is the way to go, especially when it would cause many more (unnecessary) changes. samparker: Considering that this change is just to enable a specific part of the vectorizer to do its…
///		///
PHINode *getCanonicalInductionVariable() const;		PHINode *getCanonicalInductionVariable() const;

/// Obtain the unique incoming and back edge. Return false if they are		/// Obtain the unique incoming and back edge. Return false if they are
/// non-unique or the loop is dead; otherwise, return true.		/// non-unique or the loop is dead; otherwise, return true.
bool getIncomingAndBackEdge(BasicBlock *&Incoming,		bool getIncomingAndBackEdge(BasicBlock *&Incoming,
BasicBlock *&Backedge) const;		BasicBlock *&Backedge) const;

▲ Show 20 Lines • Show All 712 Lines • Show Last 20 Lines

llvm/lib/Analysis/LoopInfo.cpp

Show All 23 Lines
#include "llvm/Analysis/MemorySSAUpdater.h"		#include "llvm/Analysis/MemorySSAUpdater.h"
#include "llvm/Analysis/ScalarEvolutionExpressions.h"		#include "llvm/Analysis/ScalarEvolutionExpressions.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/Config/llvm-config.h"		#include "llvm/Config/llvm-config.h"
#include "llvm/IR/CFG.h"		#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DebugLoc.h"		#include "llvm/IR/DebugLoc.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/IRPrintingPasses.h"		#include "llvm/IR/IRPrintingPasses.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Metadata.h"		#include "llvm/IR/Metadata.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/InitializePasses.h"		#include "llvm/InitializePasses.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
▲ Show 20 Lines • Show All 151 Lines • ▼ Show 20 Lines	if (Op0 == &IndVar \|\| Op0 == &StepInst)
return Op1;		return Op1;

if (Op1 == &IndVar \|\| Op1 == &StepInst)		if (Op1 == &IndVar \|\| Op1 == &StepInst)
return Op0;		return Op0;

return nullptr;		return nullptr;
}		}

		// Transform a counting-down loop:
		//
		// %N.addr.09 = phi i32 [ %dec, %loop.body ], [ %N, %preheader ]
		// %dec = add nsw i32 %N.addr.09, -1
		// %cmp = icmp sgt i32 %N.addr.09, 1
		// br i1 %cmp, label %loop.body, label %loopexit
		//
		// to a counting-up loop:
		//
		// %i.011 = phi i32 [ %inc, %loop.body ], [ 1, %preheader ]
		// %inc = add nuw nsw i32 %i.011, 1
		// %exitcond = icmp eq i32 %inc, %N
		// br i1 %exitcond, label %loopexit, label %loop.body
		//
		// by creating new increment, compare, and branch instructions, and return
		// the modified induction variable.
		//
		// For now, we only rewrite loops that have a step value of -1 and the
		// backedge is of the form: %iter > 1
		//
		PHINode *Loop::reverseInductionVariable(ScalarEvolution &SE) {
		MeinersburUnsubmitted Not Done Reply Inline Actions `Loop` is part of the `LLVMAnalysis` library. Transformations should be in the `LLVMTransform` library. Also, it is untypical for the analysis results to have methods for modification as well. These are typically found in the LLVMTransform library such as `LoopUtils.cpp` Meinersbur: `Loop` is part of the `LLVMAnalysis` library. Transformations should be in the `LLVMTransform`…
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the suggestion, I'm happy to move this to LoopUtils.cpp. Also checking with @Ayal, is this something we can all live with? SjoerdMeijer: Thanks for the suggestion, I'm happy to move this to LoopUtils.cpp. Also checking with @Ayal…
		AyalUnsubmitted Not Done Reply Inline Actions Constructing a canonical induction if one is missing and needed for LV's FoldTail, using SCEV::getBackedgeTakenCount() instead of pattern matching, could be placed in LoopUtils.cpp (or in LoopVectorize.cpp, using ILV::createInductionVariable()). The concern is that doing so might cause slowdowns if the loop is not vectorized, something LV has been trying hard to avoid, so far successfully, which also motivated VPlan. (BTW, doing so may also cause slowdowns even if the loop is vectorized, but that's a different story ;-). Reverting control to the original IV if the loop is not vectorized seems awkward at best. Let's try to think of alternative (1), i.e., of having LV represent in VPlan the new PIV that it will eventually create. A new PrimaryInductionRecipe (or VPInstruction) can be introduced and placed at the beginning of the first VPBasicBlock; its execute() will create a Phi-starting-at-zero, set ILV::Induction and possibly a PIV VPValue to it; the bump and compare could be taken care of in ILV.fixVectorizedLoop(). Interested in following this approach? Ayal: Constructing a canonical induction if one is missing and needed for LV's FoldTail, using SCEV…
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions Thanks for the suggestion Ayal. My initial thought was that this sounds like a lot of different moving parts for a simple thing like this. But if it overcomes the problem of doing this transformation unnecessary, then that sounds like a good plan. And also, creating a vplan recipe migt not be that bad, i.e. actually a good fit for this. I need to get up to speed with how the vplan machinery works, which is what I am doing first at the moment. SjoerdMeijer: Thanks for the suggestion Ayal. My initial thought was that this sounds like a lot of different…
		fhahnUnsubmitted Not Done Reply Inline Actions IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code. I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time? At the moment, the def/use chains are not yet modeled in VPlan completely, so introducing a new recipe would not add a lot of value (I might be missing something though), as we cannot update the users in the VPlan to use the reversed IV. Unless I am missing something, the issue could be addressed easiest by recording the information in LVL and reverse the induction variable using the utility as a codegen preparation step before executing the VPlan. Once the def/use chain is migrated to VPlan, it should be straight-forward to handle the case in VPlan. fhahn: IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in…
		AyalUnsubmitted Not Done Reply Inline Actions When building VPlan with FoldMask, a `VPValue IV` of the Primary Induction Variable is needed by // Introduce the early-exit compare IV <= BTC to form header block mask. // This is used instead of IV < TC because TC may wrap, unlike BTC. VPValue IV = Plan->getVPValue(Legal->getPrimaryInduction()); VPValue BTC = Plan->getOrCreateBackedgeTakenCount(); BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC}); Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe. Ayal:* When building VPlan with FoldMask, a `VPValue *IV` of the Primary Induction Variable is needed…
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Hi Florian, IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code. Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations. I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time? Now, it's my turn to double-check something :), do you mean like it is effectively done here in this patch? I mean, some details might need some work, code might need to be moved around a bit, but is this not essentially what this patch is doing? Or did you have something fundamentally different in mind? SjoerdMeijer: Hi Florian, > IIUC to solve the issue, we have to 1) check if we have an induction variable we…
		fhahnUnsubmitted Not Done Reply Inline Actions When building VPlan with FoldMask, a VPValue IV of the Primary Induction Variable is needed by // Introduce the early-exit compare IV <= BTC to form header block mask. // This is used instead of IV < TC because TC may wrap, unlike BTC. VPValue IV = Plan->getVPValue(Legal->getPrimaryInduction()); VPValue BTC = Plan->getOrCreateBackedgeTakenCount(); BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC}); Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe. Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (`%vp.primaryiv`) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen. To use the re-written induction variable, you could just use something like `State.PrimaryIV = OrigLoop->reverseInductionVariable(ILV.PSE.getSE());` Currently your `reverseInductionVariable` only rewrites the induction variable and related checks, but leaves other users of the IV untouched. I guess that's definitely fine if there are no other users (as in your test cases). Given that I think you also might be able to do the re-write in-place (updating the existing add/icmp/sub instructions instead of creating new ones). Then there would be no need to update the existing VPlan at the moment I think. If there are other users, it would mean that we also might change the order of the some operations in the loop (e.g. the order in which memory locations are accessed). This could be avoided by rewriting the uses of the IV with a new expression. Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations. I think we don't want to reverse the IV before deciding to vectorize. fhahn: > When building VPlan with FoldMask, a VPValue *IV of the Primary Induction Variable is needed…
		AyalUnsubmitted Not Done Reply Inline Actions Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (%vp.primaryiv) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen. VPRecipeBuilder::createBlockInMask() is part of VPlan construction rather than codegen, so having it call Legal should be fine. D77635 which follows approach (1) above continues to rely on an original primary induction (start=0, step=1) and its widening, when available, and otherwise widens the new induction created during codegen to control the vector loop (start=0, step=VFUF). Ayal:* > Right, given that it is only used there, one relatively straight-forward way to handling this…
		BasicBlock *Preheader = getLoopPreheader();
		if (!Preheader)
		return nullptr;

		PHINode *IndVar = getInductionVariable(SE);
		if (!IndVar)
		return nullptr;

		InductionDescriptor ID;
		if (!InductionDescriptor::isInductionPHI(IndVar, this, &SE, ID))
		return nullptr;

		// TODO: for now only support a -1 step-value. This could be relaxed,
		// but need more work to rewrite the loop.
		ConstantInt *Step = ID.getConstIntStepValue();
		if (!Step \|\| !Step->isMinusOne())
		return nullptr;

		ICmpInst Latch = getLatchCmpInst(this);
		if (!Latch \|\| !Latch->hasOneUse() \|\|
		Latch->getSignedPredicate() != ICmpInst::ICMP_SGT)
		samparkerUnsubmitted Not Done Reply Inline Actions So, is this the canonical form? We won't see ICMP_NE? samparker: So, is this the canonical form? We won't see ICMP_NE?
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions More canonical is ICMP_EQ, which is what you'll get when you e.g. have `while (N--)` instead of the case `while (N-- > 0)` supported here. But at the same time time, I think we can have ICMP_NE too. I considered supporting ICMP_SGT first as the minimal viable product, also to check how you reviewers and testing appreciate this change and approach, then follow-up to support some more predicates :-) SjoerdMeijer: More canonical is ICMP_EQ, which is what you'll get when you e.g. have `while (N--)` instead of…
		return nullptr;

		User Br = Latch->user_begin();
		if (!dyn_cast<BranchInst>(Br))
		samparkerUnsubmitted Not Done Reply Inline Actions isa<> or use dyn_cast in the line above. samparker: isa<> or use dyn_cast in the line above.
		return nullptr;

		BasicBlock *ExitBlock = getExitBlock();
		if (!ExitBlock)
		return nullptr;

		ConstantInt *LatchRHS = dyn_cast<ConstantInt>(Latch->getOperand(1));
		if (Latch->getOperand(0) != IndVar \|\| !LatchRHS \|\| !LatchRHS->isOne())
		samparkerUnsubmitted Not Done Reply Inline Actions Same query about ICMP_SGT 1 and ICMP_NE 0. samparker: Same query about ICMP_SGT 1 and ICMP_NE 0.
		return nullptr;

		Value *StartValue = ID.getStartValue();
		if (isa<UndefValue>(StartValue))
		return nullptr;

		// If the start value is a runtime value, we need a loop-guard that
		// ensures the loop executes at least one iteration, otherwise this
		// rewrite isn't valid. If the start value is a constant, we don't
		// necessarily need a guard as we can check that ourselves.
		ConstantInt *ConstStartValue = dyn_cast<ConstantInt>(StartValue);
		if (ConstStartValue && ConstStartValue->getValue().slt(1))
		return nullptr;
		else if (!isGuarded())
		return nullptr;

		Value *IndOp = ID.getInductionBinOp();
		IRBuilder<> IRB(IndVar->getParent());

		APInt NewStepValue = Step->getValue();
		NewStepValue.negate();

		// Create new increment, compare and branch instructions.
		Value *NewIndOp = IRB.CreateAdd(
		samparkerUnsubmitted Not Done Reply Inline Actions Have we checked that the original indvar only has one use? samparker: Have we checked that the original indvar only has one use?
		SjoerdMeijerAuthorUnsubmitted Done Reply Inline Actions Don't think so, thanks, will do. SjoerdMeijer: Don't think so, thanks, will do.
		IndVar,
		llvm::ConstantInt::get(IndOp->getType(), NewStepValue.getSExtValue()));
		IndOp->replaceAllUsesWith(NewIndOp);
		IRB.CreateCondBr(IRB.CreateICmpEQ(NewIndOp, StartValue), ExitBlock,
		samparkerUnsubmitted Not Done Reply Inline Actions I guess you could just create a cmp and replace the users, so we won't have to delete and create another branch. samparker: I guess you could just create a cmp and replace the users, so we won't have to delete and…
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions Because of the EQ predicate, I have the TRUE and FALSE block operands swapped compared to the original branch. SjoerdMeijer: Because of the EQ predicate, I have the TRUE and FALSE block operands swapped compared to the…
		IndVar->getParent());

		// Remove all the old instructions.
		dyn_cast<BranchInst>(Br)->eraseFromParent();
		Latch->eraseFromParent();
		ID.getInductionBinOp()->eraseFromParent();

		// Modify the Phi node with the new start value, i.e. 1.
		IndVar->removeIncomingValue(Preheader);
		samparkerUnsubmitted Not Done Reply Inline Actions I think just using setIncoming will be fine. samparker: I think just using setIncoming will be fine.
		SjoerdMeijerAuthorUnsubmitted Not Done Reply Inline Actions That's what I wanted, but if I haven't overlooked anything, that requires an index, which means I need to do find the index first, then do setIncoming, and then I might as well remove it and add the updated one. SjoerdMeijer: That's what I wanted, but if I haven't overlooked anything, that requires an index, which means…
		samparkerUnsubmitted Not Done Reply Inline Actions Ah yes, sorry, its called setIncomingValueForBlock. samparker: Ah yes, sorry, its called setIncomingValueForBlock.
		IndVar->addIncoming(llvm::ConstantInt::get(IndOp->getType(), 1),
		Preheader);

		// Return the modified Phi node, the induction variable.
		return IndVar;
		}

Optional<Loop::LoopBounds> Loop::LoopBounds::getBounds(const Loop &L,		Optional<Loop::LoopBounds> Loop::LoopBounds::getBounds(const Loop &L,
PHINode &IndVar,		PHINode &IndVar,
ScalarEvolution &SE) {		ScalarEvolution &SE) {
InductionDescriptor IndDesc;		InductionDescriptor IndDesc;
if (!InductionDescriptor::isInductionPHI(&IndVar, &L, &SE, IndDesc))		if (!InductionDescriptor::isInductionPHI(&IndVar, &L, &SE, IndDesc))
return None;		return None;

Value *InitialIVValue = IndDesc.getStartValue();		Value *InitialIVValue = IndDesc.getStartValue();
▲ Show 20 Lines • Show All 908 Lines • Show Last 20 Lines

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

Show First 20 Lines • Show All 514 Lines • ▼ Show 20 Lines	void LoopVectorizationLegality::addInductionPhi(
}		}

// Int inductions are special because we only allow one IV.		// Int inductions are special because we only allow one IV.
if (ID.getKind() == InductionDescriptor::IK_IntInduction &&		if (ID.getKind() == InductionDescriptor::IK_IntInduction &&
ID.getConstIntStepValue() && ID.getConstIntStepValue()->isOne() &&		ID.getConstIntStepValue() && ID.getConstIntStepValue()->isOne() &&
isa<Constant>(ID.getStartValue()) &&		isa<Constant>(ID.getStartValue()) &&
cast<Constant>(ID.getStartValue())->isNullValue()) {		cast<Constant>(ID.getStartValue())->isNullValue()) {

// Use the phi node with the widest type as induction. Use the last		// Use the phi node with the widest type as induction. Use the last
		samparkerUnsubmitted Not Done Reply Inline Actions why aren't these lambdas just a bools? samparker: why aren't these lambdas just a bools?
// one if there are multiple (no good reason for doing this other		// one if there are multiple (no good reason for doing this other
// than it is expedient). We've checked that it begins at zero and		// than it is expedient). We've checked that it begins at zero and
// steps by one, so this is a canonical induction variable.		// steps by one, so this is a canonical induction variable.
if (!PrimaryInduction \|\| PhiTy == WidestIndTy)		if (!PrimaryInduction \|\| PhiTy == WidestIndTy)
PrimaryInduction = Phi;		PrimaryInduction = Phi;
}		}

// Both the PHI node itself, and the "post-increment" value feeding		// Both the PHI node itself, and the "post-increment" value feeding
▲ Show 20 Lines • Show All 647 Lines • ▼ Show 20 Lines	bool LoopVectorizationLegality::canVectorize(bool UseVPlanNativePath) {
if (NumBlocks != 1 && !canVectorizeWithIfConvert()) {		if (NumBlocks != 1 && !canVectorizeWithIfConvert()) {
LLVM_DEBUG(dbgs() << "LV: Can't if-convert the loop.\n");		LLVM_DEBUG(dbgs() << "LV: Can't if-convert the loop.\n");
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}

		// If this is a counting-down loop, reverse the induction variable and create
		// a counting-up loop. This will probably lead to more efficient
		// vectorisation, as this will enable earlier discovery of a primary
		// induction variable which is e.g. required for tail-folding of the scalar
		// epilogue loop.
		if (PHINode IndVar = TheLoop->reverseInductionVariable(PSE.getSE())) {
		MeinersburUnsubmitted Not Done Reply Inline Actions `LoopVectorizationLegality::canVectorize` is not really a place for changing the IR. It's also a speculative transformation: The IR will have changed even if the loop at the end will not be vectorized (e.g. because it's not profitable). Meinersbur: `LoopVectorizationLegality::canVectorize` is not really a place for changing the IR. It's also…
		PrimaryInduction = IndVar;
		LLVM_DEBUG(dbgs() << "LV: Loop after indvar reversal:\n";
		TheLoop->dumpVerbose(););
		PSE.getSE()->forgetLoop(TheLoop);
		}

// Check if we can vectorize the instructions and CFG in this loop.		// Check if we can vectorize the instructions and CFG in this loop.
if (!canVectorizeInstrs()) {		if (!canVectorizeInstrs()) {
LLVM_DEBUG(dbgs() << "LV: Can't vectorize the instructions or CFG\n");		LLVM_DEBUG(dbgs() << "LV: Can't vectorize the instructions or CFG\n");
if (DoExtraAnalysis)		if (DoExtraAnalysis)
Result = false;		Result = false;
else		else
return false;		return false;
}		}
▲ Show 20 Lines • Show All 94 Lines • Show Last 20 Lines

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll

	; RUN: opt < %s -loop-vectorize -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -S \| FileCheck %s
	; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -prefer-predicate-over-epilog -S \| FileCheck %s --check-prefixes=CHECK,CHECK-TF,CHECK-PREFER
	; RUN: opt < %s -loop-vectorize -disable-mve-tail-predication=false -S \| FileCheck %s			; RUN: opt < %s -loop-vectorize -disable-mve-tail-predication=false -S \| FileCheck %s --check-prefixes=CHECK,CHECK-TF,CHECK-DISABLE-TP

	; Check that when we can't predicate this loop that it is still vectorised (with
	; an epilogue).
	; TODO: the reason this can't be predicated is because a primary induction
	; variable can't be found (not yet) for this counting down loop. But with that
	; fixed, this should be able to be predicated.

	; CHECK-LABEL: vector.body:

	target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"			target datalayout = "e-m:e-p:32:32-Fi8-i64:64-v128:64:128-a:0:32-n32-S64"
	target triple = "thumbv8.1m.main-arm-unknown-eabihf"			target triple = "thumbv8.1m.main-arm-unknown-eabihf"

	define dso_local void @foo(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) #0 {			; This IR corresponds to this type of C-code:
				;
				; void f(char a, char b, char *c, int N) {
				; while (N-- > 0)
				; c++ = a++ + *b++;
				; }
				;
				define dso_local void @sgt_loopguard(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_loopguard(
				; CHECK: vector.body:
				; CHECK-TF: masked.load
				; CHECK-TF: masked.load
				; CHECK-TF: masked.store
				entry:
				%cmp5 = icmp sgt i32 %N, 0
				br i1 %cmp5, label %while.body.preheader, label %while.end

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %while.body.preheader ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %while.body.preheader ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %while.body.preheader ]
				%dec = add nsw i32 %N.addr.09, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				; Without a loop-guard, check that we don't reverse the induction variable
				; and thus that we don't tail-fold here.
				;
				define dso_local void @sgt_no_loopguard(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_no_loopguard(
				; CHECK: vector.body:
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.store
				entry:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ %N, %entry ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %entry ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %entry ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %entry ]
				%dec = add nsw i32 %N.addr.09, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				define dso_local void @sgt_extra_use_cmp(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_extra_use_cmp(
				; CHECK: vector.body:
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.store
				entry:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ %N, %entry ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %entry ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %entry ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %entry ]
				%dec = add nsw i32 %N.addr.09, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				%select = select i1 %cmp, i8 %0, i8 %1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				define dso_local void @sgt_const_tripcount(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_const_tripcount(
				; CHECK: vector.body:
				; CHECK-TF: %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK-TF: masked.load
				; CHECK-TF: masked.load
				; CHECK-TF: masked.store
				; CHECK-TF: %index.next = add i32 %index, 16
				; CHECK-TF: %[[CMP:.*]] = icmp eq i32 %index.next, 2048
				; CHECK-TF: br i1 %[[CMP]], label %middle.block, label %vector.body
				entry:
				%cmp5 = icmp sgt i32 %N, 0
				br i1 %cmp5, label %while.body.preheader, label %while.end

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ 2048, %while.body.preheader ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %while.body.preheader ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %while.body.preheader ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %while.body.preheader ]
				%dec = add nsw i32 %N.addr.09, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				define dso_local void @sgt_no_guard_0_startval(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_no_guard_0_startval(
				; CHECK-NOT: vector.body:
				entry:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ 0, %entry ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %entry ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %entry ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %entry]
				%dec = add nsw i32 %N.addr.09, -1
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				; Step values other than -1 are not yet supported.
				;
				define dso_local void @sgt_step_minus_two(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_step_minus_two(
				; CHECK: vector.body:
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.store
				entry:
				%cmp5 = icmp sgt i32 %N, 0
				br i1 %cmp5, label %while.body.preheader, label %while.end

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %while.body.preheader ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %while.body.preheader ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %while.body.preheader ]
				%dec = add nsw i32 %N.addr.09, -2
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				define dso_local void @sgt_step_not_constant(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N, i32 %S) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_step_not_constant(
				; CHECK-NOT: vector.body:
				entry:
				%cmp5 = icmp sgt i32 %N, 0
				br i1 %cmp5, label %while.body.preheader, label %while.end

				while.body.preheader:
				br label %while.body

				while.body:
				%N.addr.09 = phi i32 [ %dec, %while.body ], [ %N, %while.body.preheader ]
				%c.addr.08 = phi i8* [ %incdec.ptr4, %while.body ], [ %c, %while.body.preheader ]
				%b.addr.07 = phi i8* [ %incdec.ptr1, %while.body ], [ %b, %while.body.preheader ]
				%a.addr.06 = phi i8* [ %incdec.ptr, %while.body ], [ %a, %while.body.preheader ]
				%dec = add nsw i32 %N.addr.09, %S
				%incdec.ptr = getelementptr inbounds i8, i8* %a.addr.06, i32 1
				%0 = load i8, i8* %a.addr.06, align 1
				%incdec.ptr1 = getelementptr inbounds i8, i8* %b.addr.07, i32 1
				%1 = load i8, i8* %b.addr.07, align 1
				%add = add i8 %1, %0
				%incdec.ptr4 = getelementptr inbounds i8, i8* %c.addr.08, i32 1
				store i8 %add, i8* %c.addr.08, align 1
				%cmp = icmp sgt i32 %N.addr.09, 1
				br i1 %cmp, label %while.body, label %while.end.loopexit

				while.end.loopexit:
				br label %while.end

				while.end:
				ret void
				}

				define dso_local void @icmp_eq(i8* noalias nocapture readonly %A, i8* noalias nocapture readonly %B, i8* noalias nocapture %C, i32 %N) #0 {
				; CHECK-LABEL: @icmp_eq
				; CHECK: vector.body:
				; TODO
	entry:			entry:
	%cmp6 = icmp eq i32 %N, 0			%cmp6 = icmp eq i32 %N, 0
	br i1 %cmp6, label %while.end, label %while.body.preheader			br i1 %cmp6, label %while.end, label %while.body.preheader

	while.body.preheader:			while.body.preheader:
	br label %while.body			br label %while.body

	while.body:			while.body:
	Show All 14 Lines

	while.end.loopexit:			while.end.loopexit:
	br label %while.end			br label %while.end

	while.end:			while.end:
	ret void			ret void
	}			}

				; This IR corresponds to this type of C-code:
				;
				; void f(char a, char b, char * __restrict c, int N) {
				; for (int i = N; i>0; i--)
				; c[i] = a[i] + b[i];
				; }
				;
				define dso_local void @sgt_for_loop(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_for_loop(
				; CHECK: vector.body:
				; CHECK-TF: masked.load
				; CHECK-TF: masked.load
				; CHECK-TF: masked.store
				entry:
				%cmp5 = icmp sgt i32 %N, 0
				br i1 %cmp5, label %for.body.preheader, label %for.end

				for.body.preheader:
				br label %for.body

				for.body:
				%i.011 = phi i32 [ %dec, %for.body ], [ %N, %for.body.preheader ]
				%arrayidx = getelementptr inbounds i8, i8* %a, i32 %i.011
				%0 = load i8, i8* %arrayidx, align 1
				%arrayidx1 = getelementptr inbounds i8, i8* %b, i32 %i.011
				%1 = load i8, i8* %arrayidx1, align 1
				%add = add i8 %1, %0
				%arrayidx4 = getelementptr inbounds i8, i8* %c, i32 %i.011
				store i8 %add, i8* %arrayidx4, align 1
				%dec = add nsw i32 %i.011, -1
				%cmp = icmp sgt i32 %i.011, 1
				br i1 %cmp, label %for.body, label %for.end

				for.end:
				ret void
				}

				define dso_local void @sgt_for_loop_i64(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_for_loop_i64(
				; CHECK: vector.body:
				; CHECK-TF: %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
				; CHECK-PREFER: masked.load
				; CHECK-PREFER: masked.load
				; CHECK-PREFER: masked.store
				;
				; With -disable-mve-tail-predication=false, the cost-model returns that
				; creating a hardwareloop is not profitable/possible, so here we don't
				; expect the tail-folding:
				;
				; CHECK-DISABLE-TP-NOT: masked.load
				; CHECK-DISABLE-TP-NOT: masked.load
				; CHECK-DISABLE-TP-NOT: masked.store
				;
				; CHECK-TF: %index.next = add i64 %index, 16
				; CHECK-TF: %[[CMP:.*]] = icmp eq i64 %index.next, %n.vec
				; CHECK-TF: br i1 %[[CMP]], label %middle.block, label %vector.body
				entry:
				%cmp14 = icmp sgt i32 %N, 0
				br i1 %cmp14, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				%conv16 = zext i32 %N to i64
				br label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%i.015 = phi i64 [ %dec, %for.body ], [ %conv16, %for.body.preheader ]
				%idxprom = trunc i64 %i.015 to i32
				%arrayidx = getelementptr inbounds i8, i8* %a, i32 %idxprom
				%0 = load i8, i8* %arrayidx, align 1
				%arrayidx4 = getelementptr inbounds i8, i8* %b, i32 %idxprom
				%1 = load i8, i8* %arrayidx4, align 1
				%add = add i8 %1, %0
				%arrayidx8 = getelementptr inbounds i8, i8* %c, i32 %idxprom
				store i8 %add, i8* %arrayidx8, align 1
				%dec = add nsw i64 %i.015, -1
				%cmp = icmp sgt i64 %i.015, 1
				br i1 %cmp, label %for.body, label %for.cond.cleanup.loopexit
				}

				; This IR corresponds to this nested-loop:
				;
				; for (int i = 0; i<N; i++)
				; for (int j = i+1; j>0; j--)
				; c[j] = a[j] + b[j];
				;
				; while the inner-loop looks similar to previous examples, we can't
				; transform this because the inner loop because isGuarded returns
				; false for the inner-loop.
				;
				define dso_local void @sgt_nested_loop(i8* noalias nocapture readonly %a, i8* noalias nocapture readonly %b, i8* noalias nocapture %c, i32 %N) local_unnamed_addr #0 {
				; CHECK-LABEL: @sgt_nested_loop(
				; CHECK-NOT: vector.body:
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.load
				; CHECK-TF-NOT: masked.store
				; CHECK: }
				;
				entry:
				%cmp21 = icmp sgt i32 %N, 0
				br i1 %cmp21, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader:
				br label %for.body

				for.cond.loopexit:
				%exitcond = icmp eq i32 %add, %N
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

				for.cond.cleanup.loopexit:
				br label %for.cond.cleanup

				for.cond.cleanup:
				ret void

				for.body:
				%i.022 = phi i32 [ %add, %for.cond.loopexit ], [ 0, %for.body.preheader ]
				%add = add nuw nsw i32 %i.022, 1
				br label %for.body4

				for.body4: ; preds = %for.body, %for.body4
				%j.020 = phi i32 [ %add, %for.body ], [ %dec, %for.body4 ]
				%arrayidx = getelementptr inbounds i8, i8* %a, i32 %j.020
				%0 = load i8, i8* %arrayidx, align 1
				%arrayidx5 = getelementptr inbounds i8, i8* %b, i32 %j.020
				%1 = load i8, i8* %arrayidx5, align 1
				%add7 = add i8 %1, %0
				%arrayidx9 = getelementptr inbounds i8, i8* %c, i32 %j.020
				store i8 %add7, i8* %arrayidx9, align 1
				%dec = add nsw i32 %j.020, -1
				%cmp2 = icmp sgt i32 %j.020, 1
				br i1 %cmp2, label %for.body4, label %for.cond.loopexit
				}

	attributes #0 = { nofree norecurse nounwind "target-features"="+armv8.1-m.main,+mve.fp" }			attributes #0 = { nofree norecurse nounwind "target-features"="+armv8.1-m.main,+mve.fp" }

This is an archive of the discontinued LLVM Phabricator instance.

[LV][LoopInfo] Transform counting-down loops to counting-up loopAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 253896

llvm/include/llvm/Analysis/LoopInfo.h

llvm/lib/Analysis/LoopInfo.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp

llvm/test/Transforms/LoopVectorize/ARM/tail-folding-counting-down.ll

[LV][LoopInfo] Transform counting-down loops to counting-up loop
AbandonedPublic