This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] Tail-predication: check get.active.lane.mask's TC value
ClosedPublic

Authored by SjoerdMeijer on Aug 17 2020, 6:53 AM.

Details

Summary

This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect, similarly like we already do for the IV (the first argument of get.active.lane).

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Aug 17 2020, 6:53 AM
SjoerdMeijer requested review of this revision.Aug 17 2020, 6:53 AM
SjoerdMeijer added inline comments.Aug 17 2020, 9:03 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
409

I probably forgot to state what I didn't do here.

Here, at this place, we could add more checks and cross reference the BTC obtained after IR pattern matching with the BTC that SCEV can calculate. For example, if we find a`%BTC = add %N, -1` for the get.active.lane intrinsic, then we could take the variable %N and see if that is used in the backedge taken count expression that SCEV can calculate for this loop. If we find %N as an operand in both expressions, we know both expressions are bound by the same variable, which is a good check to have. However, for the simple cases this is pretty simple, but as soon as we have a SCEV RecAddexpr, things get more complicated pretty fast. For example, if the pattern matched BTC instruction is described with:

{(-1 + (sext i16 %N to i32)),+,-1}<nw><%for.body>

and the BTC of the vectorised loop with a factor of 4 with:

((-4 + (4 * ({(3 + (sext i16 %N to i32))<nsw>,+,-1}<%for.body> /u 4))<nuw>) /u 4)

Then extracting %N from both of these expressions and comparing this involves writing a mini scev visitor which I am a bit reluctant to do, may not be so generic, and I was hoping that the checks already performed are good enough smoke tests....

Added a related comment to D85737; probably makes sense to continue that discussion there.

llvm/lib/Target/ARM/MVETailPredication.cpp
609

Why are we searching the basic block here, instead of just using dyn_cast<Instruction>(BTC)?

Thanks.

Following the discussion in D85737 and D86147, I am going to progress that first and change the BTC for the tripcount in the intrinsic. After that, I will return to this. We don't need to check the BTC, but need very similar checks for the tripcount.

SjoerdMeijer retitled this revision from [ARM][MVE] Tail-predication: check get.active.lane.mask's BTC value to [ARM][MVE] Tail-predication: check get.active.lane.mask's TC value.
SjoerdMeijer edited the summary of this revision. (Show Details)
SjoerdMeijer added a reviewer: samtebbs.

This is a (partial) rewrite of the patch after we changed the semantics of get.active.lane.mask to accept the loop tripcount as its second argument, and not the backedge-taken count. This now implements several checks to see if the tripcount belongs to this loop.

SjoerdMeijer added inline comments.Sep 9 2020, 7:47 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
609

There was a use for this in the previous version of this patch, to reuse some IR, but it's not necessary anymore in this version, so has been removed.

samparker added inline comments.Sep 10 2020, 1:33 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
390

nit: unnecessary parenthesis nesting.

422

It looks like this can be sunk into the if-statement that defines it.

424

isa<>

428

isa<>

432

nit: parenthesis

439

I guess we should return false for any other SCEVExpr type?

Cheers, comments addressed.

samparker accepted this revision.Sep 14 2020, 2:45 AM

I think this okay, certainly for Unknown and SCEVAddRec, and I wouldn't be up for having a big pattern searching again to completely double check that everything is as we expect. Maybe something will crop up that requires that, but at least this is a good start.

This revision is now accepted and ready to land.Sep 14 2020, 2:45 AM

Thanks for that, and agreed with your remarks. I think this is already a bit more generic/flexible and thus better than what we had, but certainly isn't fully generic. I am willing to review this once that becomes important. Then, this logic has to be moved to Scalarevolution and be made generic.

efriedma added inline comments.Sep 14 2020, 1:58 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

Why do we need this check? Emitting vctp32 should be okay even if we can't actually tail-predicate the loop. The overflow check should be enough to ensure that's it's safe to emit vctp32, I think? Or am I forgetting somthing?

SjoerdMeijer added inline comments.Sep 14 2020, 2:15 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

I could have a look where exactly, but if I am not mistaken you suggested on of the previous patches that we need to check that this tripcount/elementcount actually belongs to this loop. similarly like we already did for the IV. The reasoning was that for now get.active.lane.mask is emitted from the vector for nicely behaving loops, but it wouldn't be difficult to imagine that soon we will have a corresponding user-facing intrinsic. I think I am quoting that, if I remember well, and so these checks are needed.

And if we emit the VCTP, then that represents tail-predication. I.e., the VCTP intrinsic can be picked up in the LoweroverheadLoop pass and turned into a tail-predicated loop (after additional checks).

SjoerdMeijer added inline comments.Sep 14 2020, 3:55 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

I did have a look because I was curious if I had starting imaging things. This is the remark I had in mind:

https://reviews.llvm.org/D79175#2063586

This is remark is explicitly about "L" though. And I thought there was a similar remark about the 2nd argument when it still was the BTC (previous version of this patch), but I don't think I can't find that easily now.

efriedma added inline comments.Sep 16 2020, 6:34 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

Let me try to reformulate in a different way that might make it easier to understand. What you're doing here has two essential steps:

  1. Convert "llvm.get.active.lane.mask(X, Y)" to "llvm.arm.mve.vctp(Y - X)".
  2. Convert "Y - X" to a simpler induction variable.

In theory, you could split these steps; (1) could be legal even without (2).

Step 1 doesn't depend on the loop, or even that the statement is in a loop at all. The only requirement is that the subtraction itself doesn't overflow.

Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment.

Neither of these are should be directly connected to the trip count of the loop, I think.

The way the code is currently written, I think you're trying to prove more than you actually need to. If the induction variable has the "wrong" base or increment, ARMLowOverheadLoops will ultimately fail to tail-predicate, but I'm not sure that's actually a problem.

SjoerdMeijer added inline comments.Sep 17 2020, 7:15 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

Thanks again Eli for explaining/elaborating.

Let me know what you prefer or think is best: rip this particular bit out (revert it), or leave it for the moment. I am asking because I will need some time to have a look at this:

Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment.

This "Y - X" expression is a difficult one to analyse (it can be), and I need to see how to do that.

efriedma added inline comments.Sep 17 2020, 2:26 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

Let me know what you prefer or think is best: rip this particular bit out (revert it), or leave it for the moment. I am asking because I will need some time to have a look at this:

Post the patches in whatever order you think makes sense for review.

Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment.

This "Y - X" expression is a difficult one to analyse (it can be), and I need to see how to do that.

Step 2 should be easy. In the cases you're interested in, the "Y - X" SCEV expression should look something like {ElementCount,+,-VectorWidth}. VectorWidth is a constant, and you don't really need to analyze ElementCount.

Sorry, I wrote a reply end of last week, but apparently forgot to push submit. So please see my reply inline, but I will open a new review soon, where it's probably best to continue this discussion and my reply.

llvm/lib/Target/ARM/MVETailPredication.cpp
376

I think I got a much better understanding of your suggestions now while I tried out a few things, but that's what I wanted to double check.
Taking an example for {ElementCount,+,-VectorWidth}, and it is indeed easy to create a SCEV for that here, e.g.:

(-4 + %N)

At this point in he code here, we are not yet transforming the IR, but what we will generate is:

vector.body:
   %7 = phi i32 [ %N, %vecItor.ph ], [ %9, %vector.body ]
   %9 = sub i32 %7, 4
   br

If I understand things correctly, you would like to sanity check that SCEV expression (-4 + %N) matches this IR, and thus that Phi %7 is a nice AddRec, which I think it is by defintion?

I am not entirely sure what the added value would be of this check. Feels like that could be for example be an assert somewhere, and perhaps it is easier to do this in ARMLowOverheadLoops and not here as we don't have the transformed IR here.

I think you're trying to prove more than you actually need to.

I am kind of back to where I was before, and thinking that the current check makes some sense, but again I am of course perfectly happy to rip it out if we don't need it and let it be ARMLowOverheadLoops problem which indeed will probably not even trigger.

efriedma added inline comments.Sep 22 2020, 12:11 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
376

The PHI check I was describing is essentially the existing if (VectorWidth == StepValue) check.