This is an archive of the discontinued LLVM Phabricator instance.

[LV] Do not fold the tail when an IV is used outside of the loop.
AcceptedPublic

Authored by rickyz on Dec 4 2021, 10:01 PM.

Details

Reviewers
mkuper
fhahn
Ayal
Summary

Tail-folding does not support cases where an IV is used outside of the
loop. Before this change, the tail-folding legality check did not check
for this. In PR52335, this led to generating code with an incorrrect
index computation. Fix this by adding the missing check.

This change also adds assertions that there are no externally-used IVs
when performing tail folding.

Fixes PR52335.

Diff Detail

Event Timeline

rickyz created this revision.Dec 4 2021, 10:01 PM
rickyz requested review of this revision.Dec 4 2021, 10:01 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 4 2021, 10:01 PM
rickyz added a comment.EditedDec 4 2021, 10:08 PM

Here's some additional context on this bug:
https://github.com/llvm/llvm-project/issues/51677#issuecomment-991585227

I'm pretty new to this code, so please let me know anything if I've said anything inaccurate!

fhahn requested changes to this revision.Dec 22 2021, 3:54 AM

Thanks for the patch!

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
576

While this indeed fixes the issue for the test case, I think this is not really the right fix.

IIUC AllowedExit is supposed to contain the set of instructions that are *allowed* to have outside users. So we need to check *all* instructions that have outsides users are also in this set.

But that's not what is happening at the moment. We won't check reduction or induction phis against this set in general.

For tail folding, we need to make sure that there are no users outside the loop for any induction, independent of whether they are allowed to have outside users or not (i.e. no IV is allowed to have outside users at the moment for tail folding).

A direct way to fix this would be to use the following

--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
@@ -1292,6 +1292,17 @@ bool LoopVectorizationLegality::prepareToFoldTailByMasking() {
   for (auto &Reduction : getReductionVars())
     ReductionLiveOuts.insert(Reduction.second.getLoopExitInstr());

+  for (auto &Induction : getInductionVars())
+    if (any_of(Induction.first->users(), [this](User *U) {
+      return !TheLoop->contains(cast<Instruction>(U));
+    })) {
+      LLVM_DEBUG(
+          dbgs()
+          << "LV: Cannot fold tail by masking, loop has an outside user for "
+          << *Induction.first << "\n");
+      return false;
+    }
+

Note that it might be possible to compute the final induction value even when tail-folding similar to how we compute the final reduction value, but that's only a potential follow up.

Independent of that, we may be able to allow users of the phi itself even if there is a SCEV predicate in general.

Note that the current logic does not prevent vectorization if the induction phi is used outside in the predicated case. Without tail folding, we never check if an induction phi has users outside the loop. A possible justification is the following: the phi value is guaranteed to be valid for each iteration of the loop as per the SCEV predicate, so even if it is used outside the loop, the value is only used if the predicate holds. For the incremented value this is not true. The value computed in the last iteration may be poison and should not be used outside the scope of the predicate.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
568

All changes in this file seem to be a cleanup independent of fixing the issue. If that's the case, could you submit them as a separate patch for review?

llvm/test/Transforms/LoopVectorize/pr52335.ll
6

The C example here is mostly irrelevant as the IR generated for it may change over time and depends on the flags provided and so on. The IR contains all the relevant information and is compact. Ideally the comment for the function would describe the issue. The C source is available at the bug report, which should be sufficient.

20

this should not be needed with -force-vector-width=4. To be safe, also pass -force-vector-interleave=1.

39

nit: moving the exit blocks after the loop slightly increases readability. Also the names of the blocks could be a bit more compact.

This revision now requires changes to proceed.Dec 22 2021, 3:54 AM
fhahn added a comment.Feb 2 2022, 8:30 AM

Reverse ping. @rickyz do you plan on picking this up again?

rickyz marked an inline comment as done.Feb 2 2022, 11:27 AM

Apologies for the long delay, I've been a little sidetracked from LLVM contributions recently - I plan to get back to this within the next week or so - thank you for your detailed comments!

rickyz updated this revision to Diff 408303.Feb 13 2022, 5:33 PM
rickyz retitled this revision from [LV] Fix logic preventing tail-folding when an IV is used outside of the loop to [LV] Do not fold the tail when an IV is used outside of the loop..
rickyz edited the summary of this revision. (Show Details)

Split out cleanup, address comments.

rickyz marked 3 inline comments as done.Feb 13 2022, 5:36 PM

Apologies for the super long delay on responding to this!

llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp
576

Thank you for the detailed explanation and suggestion - that makes sense to me. I'll split out the AllowedExit change into a separate change (it should become a NFC change after this fix).

rickyz updated this revision to Diff 425853.Apr 28 2022, 11:19 AM
rickyz marked an inline comment as done.

Rebase.

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 11:19 AM
fhahn accepted this revision.May 15 2022, 1:58 PM

LGTM, thanks!

llvm/test/Transforms/LoopVectorize/pr52335.ll
11

I think it would be good to check the full vector body to make the test more robust. It would also be good to submit the test separately before and then only have the test-diff for the change here.

This revision is now accepted and ready to land.May 15 2022, 1:58 PM
fhahn added inline comments.May 15 2022, 1:59 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3328

this is independent of the patch and could be moved out of the patch, right? If so, it would be good to split of.

fhahn added a comment.Aug 25 2022, 1:13 PM

@rickyz I think all this needs is to pre-commit the test and then land the change. Please let me know if you have commit access. If not, I can land the patches on your behalf. In that case, please let me know which name & email I should use to set the git authorship.

fhahn added a comment.May 13 2023, 1:28 AM

@rickyz I am planning on going ahead and landing this soon using the info from Phabricator.