This is an archive of the discontinued LLVM Phabricator instance.

[LV] Perform recurrence sinking directly on VPlan.
ClosedPublic

Authored by fhahn on Jan 25 2023, 3:51 PM.

Details

Summary

This patch updates LV to sink recipes directly using the VPlan use
chains. The initial patch only moves sinking to be purely VPlan-based.
Follow-up patches will move legality checks to VPlan as well.

At the moment, there's a single test failure remaining.

Diff Detail

Event Timeline

fhahn created this revision.Jan 25 2023, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 3:51 PM
fhahn requested review of this revision.Jan 25 2023, 3:51 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2023, 3:51 PM
fhahn updated this revision to Diff 492551.Jan 26 2023, 1:03 PM

Fix remaining erros, refactor. Should be ready for review now.

fhahn retitled this revision from [LV] Perform recurrence sinking directly on VPlan. (WIP) to [LV] Perform recurrence sinking directly on VPlan..Jan 26 2023, 1:04 PM
Ayal added a comment.Jan 31 2023, 11:42 AM

A very nice step forward! Raises a few thoughts regarding recipes of predicated instructions, arguably independent of the patch.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8934–8935

Comment above deserves update.

8981–8983

nit: FOR now stands for FixedOrderRecurrence which leads to fixFixedOrderRecurrences() ... how about addSplicesToFixedOrderRecurrences()?

llvm/lib/Transforms/Vectorize/VPlan.h
1262 ↗(On Diff #492551)

redundant - aka VPHeaderPHIRecipe::getBackedgeValue()?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

nit: lex order?

552

aka dominates?

Note for completeness: a recipe is considered to dominate itself, i.e., returns true if the same recipe is passed as both A and B.

579

This checking of regions is needed rather than recipeComesBefore()'s checking of parenting basic blocks, because neither basic block dominates the other when they reside in replicate regions? What if only A resides inside a replicate region?
When two recipes belong to the same block, their dominance is clear. Otherwise, if any recipe resides inside a replicate region, should the exit block of the replicate region be considered instead of its parenting basic block? Thereby collapsing the region into a node, or considering a pred-inst-phi instead of its pred-inst.

609

nit: assert with an error message that Current has at most a single value?

622

Clearly SinkCandidate should not be sunk past any of its operands, but this is a bit confusing: Curr dominates SinkCandidate by definition of being its operand, so if Previous=Target dominates Curr it follows that Previous dominates SinkCandidate - however such SinkCandidates are filtered from entering InstrsToSink?

700

(Independent of this patch) Another way of thinking about this: if Previous resides inside a replicate region, treat its pred-inst-phi as if it were Previous? Indeed, such a predicated instruction does not dominate the latch...

703

(Independent of this patch) Should this be under the if (Region)? I.e., Previous surely has a VPBasicBlock parent, and Region has a single successor, but the latter could potentially be a region rather than a VPBasicBlock?

707

hmm, could Previous be a header phi (i.e., induction, given that reductions feed only themselves inside the loop, and we exhausted all FOR's), or is this checking for blend/pred-inst-phi's?

714

(Independent of this patch) Note that if FOR->getBackedgeValue() differs from Previous then it is a FOR header phi, so in any case it will dominate RecurSplice.

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
27

nit: place last to try and keep lex order?

fhahn updated this revision to Diff 493718.Jan 31 2023, 1:23 PM
fhahn marked 10 inline comments as done.

Address latest comments, thanks! I think it would be good to address the independent comments once the new sinking code has settled in.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8934–8935

Done, thanks!

8981–8983

Updated to adjustFixedOrderRecurrences, as it also requires sinking. WDYT?

llvm/lib/Transforms/Vectorize/VPlan.h
1262 ↗(On Diff #492551)

Ah yes, removed!

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

moved, I am not sure why clang-format didn't fix it

552

Renamed! Might make sense to move it to VPDominatorTree eventually, but probably best to avoid the linear scan

579

Adjusted! I also updated the code to not use a set but just sort the vector once after building the list.

609

Added, thanks! It might simplify things if we would provide VPRecipeBase::users() as follow-up

622

This is not needed as long as we keep the recipes to sink in dominance order. This should ensure that we can just update Previous = SinkCandidate. When looking at the operands, an operand may temporarily not dominate SInkCandidate, as it has already been moved past previous.

707

It is checking for header phis, can adjust once the new sinking code has settled in?

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
27

Reordered, thanks!

Ayal added a comment.Feb 5 2023, 9:10 AM

Address latest comments, thanks! I think it would be good to address the independent comments once the new sinking code has settled in.

Sure, also raises the thought of initially representing predicated regions as recipes when dealing with dominance and sinking/motion, then expand them to replicated regions before sinking scalars and code generation.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8981–8983

Very good. A comment may help explain what adjust stands for.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

still one out of place?

562

How about something like:

auto *ParentA = A->getParent();
auto *ParentB = B->getParent();
if (ParentA == ParentB)
  return LocalComesBefore(A, B);

auto *RegionA = GetReplicateRegion(const_cast<VPRecipeBase *>(A));
auto *RegionB = GetReplicateRegion(const_cast<VPRecipeBase *>(B));
if (RegionA)
  ParentA = RegionA->getExiting();
if (RegionB)
  ParentB = RegionB->getExiting();
return VPDT.dominates(ParentA, ParentB);
578

RecipesToSink should be defined next to WorkList, or folded to reuse WorkList or Seen instead: RecipesToSink is a copy of WorkList excluding initial FOR and all VPPredInstPHIRecipes - these can alternatively be ignored when traversing RecipesToSink. See below regarding using Seen instead of RecipesToSink.

607

Recipes to sink are already held in a set: can Seen be set to use this order and serve as RecipesToSink?

611

What if RegionA but not RegionB or vise versa? Should fold the replicate region logic into dominates() above?

fhahn updated this revision to Diff 495617.Feb 7 2023, 12:47 PM
fhahn marked 4 inline comments as done.

Address latest comments, thanks!

fhahn added inline comments.Feb 7 2023, 12:49 PM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

argh yes, fixed!

562

Adjusted, thanks!

578

Yeah, updated to ignore the reciepes instead and used WorkList.

607

SmallPtrSet and SetVector cannot be sorted unfortunately. Updated to use WorkList.

611

Adjusted as above, thanks!

Ayal accepted this revision.Feb 7 2023, 4:05 PM

This looks good to me, thanks! Adding couple of final nits.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8979

(Independent of this patch) Can/Should adjustRecipesForReductions() be moved to VPlanTransforms?

8981–8983

Worth documenting what adjustFixedOrderRecurrences() does.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

ah, includes still seem out of lex order (they're not quite in order w/o the patch).

589

Worth asserting SinkCandidate != Previous? FORs shouldn't close a dependence cycle.

607

Thought was to use some other set with "compare" set to dominance order (std::set? Admittedly not SmallPtrSet) to indicate which candidates were already seen and to later traverse them in the desired order.

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
88

Better keep optimizeForVFAndUF() last, as it operates after BestVP and BestUF have been chosen?

This revision is now accepted and ready to land.Feb 7 2023, 4:05 PM
fhahn marked 6 inline comments as done.Feb 8 2023, 7:43 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
8979

Unfortunately the function still uses the cost model in multiple places; some should be relatively straight-forward to replace as a first step though.

8981–8983

Done, also added a comment to VPlanTransforms.h

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
15

Yes, I think that was why clang-format moved it out-of-place. Will fix separately.

589

Added, thanks!

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
88

Will do.

This revision was landed with ongoing or failed builds.Feb 8 2023, 7:49 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 5 inline comments as done.

Hello. We started hitting an infinite recursion in a PGO build after this. There is code to reproduce it in https://godbolt.org/z/EcvM8cGoK, hopefully.

fhahn added a comment.Feb 13 2023, 5:52 AM

Hello. We started hitting an infinite recursion in a PGO build after this. There is code to reproduce it in https://godbolt.org/z/EcvM8cGoK, hopefully.

Thanks! Should be fixed in af3c25dc3d87

ashay-github added a subscriber: ashay-github.EditedFeb 13 2023, 9:14 AM

@fhahn, This patch is causing a Windows Debug build (with VS 2022 MSVC 19.34.31937) of opt to crash for the llvm\test\Transforms\LoopVectorize\first-order-recurrence-sink-replicate-region.ll test (see stack trace below). Sadly, the fix in commit af3c25dc does not resolve the problem. I (locally) reverted this patch and validated that the crash is resolved.

Here is a partial stack trace. Could you revert / fix this crash soon?

0.      Program arguments: build\bin\opt.exe -passes=loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -force-widen-divrem-via-safe-divisor=0 -disable-output -debug-only=loop-vectorize llvm\test\Transforms\LoopVectorize\first-order-recurrence-sink-replicate-region.ll

Exception Code: 0x80000003

 #0 0x00007ff60564a39a std::_Debug_lt_pred<`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> &,llvm::VPRecipeBase * &,llvm::VPRecipeBase * &,0> C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\xutility:1096:0

 #1 0x00007ff60564b8da std::_Insertion_sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7519:0

 #2 0x00007ff60564c82d std::_Sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7640:0

 #3 0x00007ff605650b39 std::sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7670:0

 #4 0x00007ff605650aa8 llvm::sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1706:0

 #5 0x00007ff605650a61 llvm::sort<llvm::SmallVector<llvm::VPRecipeBase *,6> &,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1711:0

 #6 0x00007ff6056448c3 sinkRecurrenceUsersAfterPrevious llvm-project\llvm\lib\Transforms\Vectorize\VPlanTransforms.cpp:619:0

 ...

EDIT: For some context, here is the function in the xutility file where the crash occurs (the failure is in the call to _STL_VERIFY):

template <class _Pr, class _Ty1, class _Ty2,
    enable_if_t<is_same_v<_Remove_cvref_t<_Ty1>, _Remove_cvref_t<_Ty2>>, int> = 0>
constexpr bool _Debug_lt_pred(_Pr&& _Pred, _Ty1&& _Left, _Ty2&& _Right) noexcept(
    noexcept(_Pred(_Left, _Right)) && noexcept(_Pred(_Right, _Left))) {
    // test if _Pred(_Left, _Right) and _Pred is strict weak ordering, when the arguments are the cv-same-type
    const auto _Result = static_cast<bool>(_Pred(_Left, _Right));
    if (_Result) {
        _STL_VERIFY(!_Pred(_Right, _Left), "invalid comparator");
    }

    return _Result;
}
fhahn added a comment.Feb 13 2023, 9:19 AM

@fhahn, This patch is causing a Windows Debug build (with VS 2022 MSVC 19.34.31937) of opt to crash for the llvm\test\Transforms\LoopVectorize\first-order-recurrence-sink-replicate-region.ll test (see stack trace below). Sadly, the fix in commit af3c25dc does not resolve the problem. I (locally) reverted this patch and validated that the crash is resolved.

Here is a partial stack trace. Could you revert / fix this crash soon?

0.      Program arguments: build\bin\opt.exe -passes=loop-vectorize -force-vector-width=2 -force-vector-interleave=1 -force-widen-divrem-via-safe-divisor=0 -disable-output -debug-only=loop-vectorize llvm\test\Transforms\LoopVectorize\first-order-recurrence-sink-replicate-region.ll

Exception Code: 0x80000003

 #0 0x00007ff60564a39a std::_Debug_lt_pred<`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> &,llvm::VPRecipeBase * &,llvm::VPRecipeBase * &,0> C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\xutility:1096:0

 #1 0x00007ff60564b8da std::_Insertion_sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7519:0

 #2 0x00007ff60564c82d std::_Sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7640:0

 #3 0x00007ff605650b39 std::sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7670:0

 #4 0x00007ff605650aa8 llvm::sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1706:0

 #5 0x00007ff605650a61 llvm::sort<llvm::SmallVector<llvm::VPRecipeBase *,6> &,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1711:0

 #6 0x00007ff6056448c3 sinkRecurrenceUsersAfterPrevious llvm-project\llvm\lib\Transforms\Vectorize\VPlanTransforms.cpp:619:0

 ...

Thanks for the report. This seems to only show up on Windows, so I cannot reproduce this locally.

I *think* I've seen similar issues in the past, could you try to see if the following patch fixes the issue:

diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 10a67081a0d0..ddbd27223044 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -612,7 +612,7 @@ sinkRecurrenceUsersAfterPrevious(VPFirstOrderRecurrencePHIRecipe *FOR,

   // Keep recipes to sink ordered by dominance so earlier instructions are
   // processed first.
-  sort(WorkList, [&VPDT](const VPRecipeBase *A, const VPRecipeBase *B) {
+  stable_sort(WorkList, [&VPDT](const VPRecipeBase *A, const VPRecipeBase *B) {
     return dominates(A, B, VPDT);
   });
```?

could you try to see if the following patch fixes the issue:

Unfortunately, that didn't fix it. I see the call to stable_sort in the stack trace, but other than that, it's the same as before:

#0 0x00007ff662b7ce6a std::_Debug_lt_pred<`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> &,llvm::VPRecipeBase * &,llvm::VPRecipeBase * &,0> C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\xutility:1096:0

#1 0x00007ff662bb23ba std::_Insertion_sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7519:0
                                                                               
#2 0x00007ff662c1da94 std::stable_sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:8075:0

#3 0x00007ff662ba07f1 llvm::stable_sort<llvm::SmallVector<llvm::VPRecipeBase *,6> &,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1955:0

#4 0x00007ff6664948c3 sinkRecurrenceUsersAfterPrevious llvm-project\llvm\lib\Transforms\Vectorize\VPlanTransforms.cpp:619:0

I pasted the code from xutility where the crash occurs and it seems to think that the comparator is invalid. Does that provide any hints / clues?

fhahn added a comment.Feb 13 2023, 1:28 PM

could you try to see if the following patch fixes the issue:

Unfortunately, that didn't fix it. I see the call to stable_sort in the stack trace, but other than that, it's the same as before:

#0 0x00007ff662b7ce6a std::_Debug_lt_pred<`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> &,llvm::VPRecipeBase * &,llvm::VPRecipeBase * &,0> C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\xutility:1096:0

#1 0x00007ff662bb23ba std::_Insertion_sort_unchecked<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:7519:0
                                                                               
#2 0x00007ff662c1da94 std::stable_sort<llvm::VPRecipeBase * *,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.34.31933\include\algorithm:8075:0

#3 0x00007ff662ba07f1 llvm::stable_sort<llvm::SmallVector<llvm::VPRecipeBase *,6> &,`sinkRecurrenceUsersAfterPrevious'::`2'::<lambda_2> > llvm-project\llvm\include\llvm\ADT\STLExtras.h:1955:0

#4 0x00007ff6664948c3 sinkRecurrenceUsersAfterPrevious llvm-project\llvm\lib\Transforms\Vectorize\VPlanTransforms.cpp:619:0

I pasted the code from xutility where the crash occurs and it seems to think that the comparator is invalid. Does that provide any hints / clues?

Thanks for sharing the additional info. I think this may be checking that the comparator doesn't return true for B < A if A < B. This can happen with the current comparator. I pushed a fix that uses properlyDominates to avoid this. Could you check if you still see the issue? If so, I'll revert the set of patches and will try to check this out on a Windows system,.Fix is 807d43239a5f

Fix is 807d43239a5f

Yup, that resolved the issue! opt no longer crashes, thanks!

Ayal added inline comments.Feb 14 2023, 1:26 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
607

As in SLPVectorizer.cpp's std::set<ScheduleData *, ScheduleDataCompare> ReadyInsts;, thereby saving the need to explicitly sort WorkList below. Best confirm that it works on Windows too...