This is an archive of the discontinued LLVM Phabricator instance.

[LV][LoopInfo] Transform counting-down loops to counting-up loop
AbandonedPublic

Authored by SjoerdMeijer on Mar 26 2020, 5:01 AM.

Details

Summary

This adds a transformation to LoopInfo to reverse the induction variable if it is counting downwards. This canonicalisation of the loop is used by the vectoriser to discover earlier the primary induction variable. A minimal example is this:

void f (char *a, char *b, char *c, int N) {
  while (N-- > 0)
    *c++ = *a++ + *b++;
}

This will (of course) be vectorised, but when tail-folding is requested quite early in the vectorisation pipeline, it hasn't discovered the primary induction variable, inhibiting tail-folding for counting down loops which needs a primary IV for masking the loads/stores. By an early rewrite of this loop to a counting-up loop, we enable this tail-folding.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Mar 26 2020, 5:01 AM
samparker added inline comments.Mar 26 2020, 6:53 AM
llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
525

why aren't these lambdas just a bools?

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1828

same here.

Ayal added a comment.Mar 26 2020, 10:51 AM

It would indeed be good to have FoldTail handle loops that currently lack a Primary Induction Variable (0, +1). Such an %iv is needed in order to build the "%iv < %TripCount" comparison for masking.
In order to handle loops with reversed (%TripCount, -1) induction variables %riv instead of a PIV, this comparison should use %iv = %TripCount - %riv, or simply compare "%riv > 0" instead.
Note that LV eventually always builds a new PIV to control the vector loop, so best use it instead of an %riv; but this PIV is not (yet) represented in VPlan directly, i.e., in the absence of an original PIV.
There are however non-primary cases other than reversed whose tail would also be good to fold, e.g., when the start index is non-zero or when SCEV can somehow determine the trip count as in PR40816.

So it would be better to provide a more general solution, such as (1) having LV build this comparison using a new PIV if needed, which requires additional VPlan support; (2) canonicalize loops to have a PIV before they reach LV, based on (Predicated?) SCEV, possibly by extending/rerunning loop-simplify or indvars as suggested in https://reviews.llvm.org/D68577#1742745.
Note that (1) follows the general long-term direction of modelling the entire vector loop in VPlan.
Note that (2) may help LV in additional aspects, e.g., by clearing the discrepancy that last reverted D68577.

Sounds reasonable to go for (2) at this time?

Thanks for commenting!

Sounds reasonable to go for (2) at this time?

Yep, completely agreed. This was actually my initial approach, but now I can't remember why I gave up on it, possibly because loop-simplify wasn't doing what I wanted and/or I got the impression it could just be a small fix in the LV. But that's not an excuse, so will indeed go for the loop canonical form approach.

I will keep this ticket open and rebrand/reuse it when I have results with loop-simplify/indvars.

SjoerdMeijer retitled this revision from [LV] WIP: Tail-folding counting down loops to [LV][LoopInfo] Transform counting-down loops to counting-up loop.
SjoerdMeijer edited the summary of this revision. (Show Details)
SjoerdMeijer added a reviewer: dmgreen.

Rewrite using new function reverseInductionVariable in LoopInfo, see also the updated Title/Summary.

samparker added inline comments.Apr 1 2020, 1:27 AM
llvm/lib/Analysis/LoopInfo.cpp
241 ↗(On Diff #253896)

So, is this the canonical form? We won't see ICMP_NE?

245 ↗(On Diff #253896)

isa<> or use dyn_cast in the line above.

253 ↗(On Diff #253896)

Same query about ICMP_SGT 1 and ICMP_NE 0.

277 ↗(On Diff #253896)

Have we checked that the original indvar only has one use?

281 ↗(On Diff #253896)

I guess you could just create a cmp and replace the users, so we won't have to delete and create another branch.

290 ↗(On Diff #253896)

I think just using setIncoming will be fine.

SjoerdMeijer marked an inline comment as done.Apr 1 2020, 3:09 AM
SjoerdMeijer added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
241 ↗(On Diff #253896)

More canonical is ICMP_EQ, which is what you'll get when you e.g. have while (N--) instead of the case while (N-- > 0) supported here. But at the same time time, I think we can have ICMP_NE too.
I considered supporting ICMP_SGT first as the minimal viable product, also to check how you reviewers and testing appreciate this change and approach, then follow-up to support some more predicates :-)

277 ↗(On Diff #253896)

Don't think so, thanks, will do.

281 ↗(On Diff #253896)

Because of the EQ predicate, I have the TRUE and FALSE block operands swapped compared to the original branch.

290 ↗(On Diff #253896)

That's what I wanted, but if I haven't overlooked anything, that requires an index, which means I need to do find the index first, then do setIncoming, and then I might as well remove it and add the updated one.

Ayal added inline comments.Apr 1 2020, 4:25 AM
llvm/include/llvm/Analysis/LoopInfo.h
580 ↗(On Diff #253896)

Following the above comment, this Analysis should rely on a previous Transformation to produce a canonical induction variable, if needed.

If this transformation is applied to a loop before deciding to vectorize it, there may be potential slowdowns when the loop remains unvectorized; so best handled independent of LV.

In terms of implementation, as far as LV is concerned, if getCanonicalInductionVariable() fails and one is to be constructed, do so by relying on SCEV::getBackedgeTakenCount() instead of pattern matching.

Cf. http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html

SjoerdMeijer added inline comments.Apr 1 2020, 8:36 AM
llvm/include/llvm/Analysis/LoopInfo.h
580 ↗(On Diff #253896)

Sorry, but I just want to be sure, where does the transformation need to be implemented? Is that Indvar simplify, in SCEV, or a looputil function?

I may have also read a few things differently than I do know. For example, I thought I understood that it was undesired to do this in IndVarSimplify from the mail on the dev list. And also regarding that mail http://lists.llvm.org/pipermail/llvm-dev/2020-March/140433.html, and perhaps I should write this to the list, but I don't think I understand "As a consequence, any loop structure that is recognized by SCEV will (/should) not profit from rewriting."

Ayal added inline comments.Apr 1 2020, 9:08 AM
llvm/include/llvm/Analysis/LoopInfo.h
580 ↗(On Diff #253896)

If the transformation is to be applied to all loops, vectorized or not, it should be part of Indvar, as it once was.
The argument on the dev list is that (a) it has and may still cause slowdowns with small if any profit, and (b) passes should use SCEV instead of relying on specific forms of IR patterns. Tried to argue against both in http://lists.llvm.org/pipermail/llvm-dev/2020-April/140539.html. Perhaps only a limited form of the old iv-rewrite should be re-enabled, e.g., canonicalizing the primary iv only, in loops that "appear" vectorizable.

samparker added inline comments.Apr 2 2020, 1:31 AM
llvm/include/llvm/Analysis/LoopInfo.h
580 ↗(On Diff #253896)

Considering that this change is just to enable a specific part of the vectorizer to do its thing, I'm not convinced that extracting it out is the way to go, especially when it would cause many more (unnecessary) changes.

Meinersbur added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Loop is part of the LLVMAnalysis library. Transformations should be in the LLVMTransform library.

Also, it is untypical for the analysis results to have methods for modification as well. These are typically found in the LLVMTransform library such as LoopUtils.cpp

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
1204

LoopVectorizationLegality::canVectorize is not really a place for changing the IR.

It's also a speculative transformation: The IR will have changed even if the loop at the end will not be vectorized (e.g. because it's not profitable).

SjoerdMeijer added inline comments.Apr 2 2020, 12:17 PM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Thanks for the suggestion, I'm happy to move this to LoopUtils.cpp. Also checking with @Ayal, is this something we can all live with?

Ayal added inline comments.Apr 2 2020, 5:21 PM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Constructing a canonical induction if one is missing and needed for LV's FoldTail, using SCEV::getBackedgeTakenCount() instead of pattern matching, could be placed in LoopUtils.cpp (or in LoopVectorize.cpp, using ILV::createInductionVariable()).

The concern is that doing so might cause slowdowns if the loop is not vectorized, something LV has been trying hard to avoid, so far successfully, which also motivated VPlan. (BTW, doing so may also cause slowdowns even if the loop is vectorized, but that's a different story ;-). Reverting control to the original IV if the loop is not vectorized seems awkward at best.

Let's try to think of alternative (1), i.e., of having LV represent in VPlan the new PIV that it will eventually create. A new PrimaryInductionRecipe (or VPInstruction) can be introduced and placed at the beginning of the first VPBasicBlock; its execute() will create a Phi-starting-at-zero, set ILV::Induction and possibly a PIV VPValue to it; the bump and compare could be taken care of in ILV.fixVectorizedLoop().
Interested in following this approach?

samparker added inline comments.Apr 3 2020, 12:50 AM
llvm/lib/Analysis/LoopInfo.cpp
290 ↗(On Diff #253896)

Ah yes, sorry, its called setIncomingValueForBlock.

SjoerdMeijer added inline comments.Apr 3 2020, 3:45 AM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Thanks for the suggestion Ayal. My initial thought was that this sounds like a lot of different moving parts for a simple thing like this. But if it overcomes the problem of doing this transformation unnecessary, then that sounds like a good plan. And also, creating a vplan recipe migt not be that bad, i.e. actually a good fit for this. I need to get up to speed with how the vplan machinery works, which is what I am doing first at the moment.

fhahn added inline comments.Apr 3 2020, 4:34 AM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code.

I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time?

At the moment, the def/use chains are not yet modeled in VPlan completely, so introducing a new recipe would not add a lot of value (I might be missing something though), as we cannot update the users in the VPlan to use the reversed IV.

Unless I am missing something, the issue could be addressed easiest by recording the information in LVL and reverse the induction variable using the utility as a codegen preparation step before executing the VPlan. Once the def/use chain is migrated to VPlan, it should be straight-forward to handle the case in VPlan.

Ayal added inline comments.Apr 3 2020, 5:05 AM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

When building VPlan with FoldMask, a VPValue *IV of the Primary Induction Variable is needed by

// Introduce the early-exit compare IV <= BTC to form header block mask.
// This is used instead of IV < TC because TC may wrap, unlike BTC.
VPValue *IV = Plan->getVPValue(Legal->getPrimaryInduction());
VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});

Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe.

SjoerdMeijer marked an inline comment as done.Apr 3 2020, 5:09 AM
SjoerdMeijer added inline comments.
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Hi Florian,

IIUC to solve the issue, we have to 1) check if we have an induction variable we can reverse in LVL, 2) record that, 3) reverse the IV when generating code.

Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations.

I might be missing something, but wouldn't it be easiest to record IVs that require reversing in LVL (similar to how we already record IVs and reductions) and use the reversing utility as a preparation step at codegen time?

Now, it's my turn to double-check something :), do you mean like it is effectively done here in this patch? I mean, some details might need some work, code might need to be moved around a bit, but is this not essentially what this patch is doing? Or did you have something fundamentally different in mind?

fhahn added inline comments.Apr 6 2020, 12:42 PM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

When building VPlan with FoldMask, a VPValue *IV of the Primary Induction Variable is needed by

// Introduce the early-exit compare IV <= BTC to form header block mask.
// This is used instead of IV < TC because TC may wrap, unlike BTC.
VPValue *IV = Plan->getVPValue(Legal->getPrimaryInduction());
VPValue *BTC = Plan->getOrCreateBackedgeTakenCount();
BlockMask = Builder.createNaryOp(VPInstruction::ICmpULE, {IV, BTC});

Approach (2) is to fix the incoming IR so it has a PIV, approach (1) is to represent the PIV using ingredient-less VPValue/Recipe.

Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (%vp.primaryiv) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen.

To use the re-written induction variable, you could just use something like State.PrimaryIV = OrigLoop->reverseInductionVariable(*ILV.PSE.getSE());

Currently your reverseInductionVariable only rewrites the induction variable and related checks, but leaves other users of the IV untouched. I guess that's definitely fine if there are no other users (as in your test cases).
Given that I think you also might be able to do the re-write in-place (updating the existing add/icmp/sub instructions instead of creating new ones). Then there would be no need to update the existing VPlan at the moment I think.

If there are other users, it would mean that we also might change the order of the some operations in the loop (e.g. the order in which memory locations are accessed). This could be avoided by rewriting the uses of the IV with a new expression.

Yep, that's pretty much it. And just to add a little bit to your point 3: generate the code just before the vectoriser does most of its analysis/transformations.

I think we *don't* want to reverse the IV before deciding to vectorize.

Ayal added inline comments.Apr 7 2020, 3:30 AM
llvm/lib/Analysis/LoopInfo.cpp
220 ↗(On Diff #253896)

Right, given that it is only used there, one relatively straight-forward way to handling this might be to add a dedicated VPValue (%vp.primaryiv) for the primary induction variable to the plan and use it there. I've put up a patch for that in D77577. I think it might be a beneficial refactor either way, as using Legal->getPrimaryInduction() unnecessarily couples Legal and codegen.

VPRecipeBuilder::createBlockInMask() is part of VPlan construction rather than codegen, so having it call Legal should be fine. D77635 which follows approach (1) above continues to rely on an original primary induction (start=0, step=1) and its widening, when available, and otherwise widens the new induction created during codegen to control the vector loop (start=0, step=VF*UF).

SjoerdMeijer abandoned this revision.Apr 22 2020, 2:02 AM

Implemented with a vplan transformation in D77635, abandoning this change.