Page MenuHomePhabricator

[ARM][MVE] Tail-predication: support nested loops with dependent iterators.

Authored by SjoerdMeijer on Apr 28 2020, 6:28 AM.



We were not able to determine the number of elements processed by the loop (the scalar loop iteration count) for loops with dependent iterators, and so tail-predication was not triggering. The scalar loop iteration is found by pattern matching the masked load/store instruction in the vector body that use this value, which is then checked with SCEV information to make sure that this is right. Not only does the SCEV expression for these type of loops look different, but also finding the actual trip count requires more work, and most changes here are related to this.

Supported now are cases where only the inner loop iterators receives values from its outer loop, like this nested loop example:

for (i = 0; i < N; i++)
  M = Size - i;
  for (j = 0; j < M; j++)

And also a 3d example like this because the SCEV expression is the same:

for (k = 0; k < N; k++)
  for (i = 0; i < N; i++)
    M = Size - i;
    for (j = 0; j < M; j++)

And this will cover most reduction kernels that we currently have.


The general case where any inner loop iterator can depend on its outer loop is not yet supported. For example, here i is initialised with k, and j is initialised with the value from its parent loop i:

for (k = 0; k < N; k++)
  for (i = k; i < N; i++)
    for (j = i; j < M; j++)

The reason that this is not yet support is that pattern matching this SCEV is unwieldy as it almost requires a general SCEV visitor as this involves, scAddExpr, scAddRecExpr, scUMaxExpr, and scSMaxExpr SCEV types and still not very general. Instead, as a follow up, we would like to emit the scalar iteration count with an intrinsic, similar like how this is done for the hardware-loop instruction, which we can then simply pick up here, and then we don't need all this pattern matching anymore.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Apr 28 2020, 6:28 AM
samparker added inline comments.Apr 28 2020, 7:05 AM

I don't follow what the importance is here between a Value and an Instruction. Why couldn't a single nested loop have NumElements calculated by an instruction?


Missing LLVM_DEBUG macro.


I think it would be more informative to explain most of this using SCEV expression examples, I'm struggling to understand what's happening here.

SjoerdMeijer marked an inline comment as done.Apr 28 2020, 8:00 AM
SjoerdMeijer added inline comments.

yeah, good point, was just thinking the same, and thought this would be incomprehensible indeed without some SCEV examples.

  • have rewritten the comments,
  • created one code path, removed the early distinction between values and instructions,
  • which did also improve the search a bit, for which I have added a test case.
samparker added a comment.EditedApr 30 2020, 6:26 AM

I'm not convinced that this is the route to take, I think we should be questioning whether it's necessary to back substitute the NumElements through MatchElemCountLoopSetup. If we don't have to do that, then I think we can use SCEV arithmetic, instead of pattern matching, to get our count:

--- a/llvm/lib/Target/ARM/MVETailPredication.cpp
+++ b/llvm/lib/Target/ARM/MVETailPredication.cpp
@@ -494,16 +494,12 @@ bool MVETailPredication::ComputeRuntimeElements(TripCountPattern &TCP) {
     } else
       return nullptr;
-    if (auto *RoundUp = dyn_cast<SCEVAddExpr>(S->getLHS())) {
-      if (auto *Const = dyn_cast<SCEVConstant>(RoundUp->getOperand(0))) {
-        if (Const->getAPInt() != (VF->getValue() - 1))
-          return nullptr;
-      } else
-        return nullptr;
-      return RoundUp->getOperand(1);
-    }
-    return nullptr;
+    auto *RoundUp = S->getLHS();
+    auto VFMinusOne =
+      SE->getAddExpr(S->getRHS(),
+                     SE->getNegativeSCEV(SE->getOne(S->getType())));
+    auto UndoRound = SE->getAddExpr(RoundUp, SE->getNegativeSCEV(VFMinusOne));
+    return UndoRound;
   // TODO: Can we use SCEV helpers, such as findArrayDimensions, and friends to
@@ -534,9 +530,6 @@ bool MVETailPredication::ComputeRuntimeElements(TripCountPattern &TCP) {
   SCEVExpander Expander(*SE, DL, "elements");
   TCP.NumElements = Expander.expandCodeFor(Elems, Elems->getType(), InsertPt);
-  if (!MatchElemCountLoopSetup(L, TCP.Shuffle, TCP.NumElements))
-    return false;
   return true;
SjoerdMeijer abandoned this revision.Jun 4 2020, 1:17 AM

Hi Sam, thanks for looking at this, but we don't need this anymore because we're doing it properly in D79175, so abandoning this change.