This adds additional checks for the original scalar loop tripcount value, i.e. get.active.lane.mask second argument, and perform several sanity checks to see if it is of the form that we expect, similarly like we already do for the IV (the first argument of get.active.lane).
Details
Diff Detail
Event Timeline
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
409 | I probably forgot to state what I didn't do here. Here, at this place, we could add more checks and cross reference the BTC obtained after IR pattern matching with the BTC that SCEV can calculate. For example, if we find a`%BTC = add %N, -1` for the get.active.lane intrinsic, then we could take the variable %N and see if that is used in the backedge taken count expression that SCEV can calculate for this loop. If we find %N as an operand in both expressions, we know both expressions are bound by the same variable, which is a good check to have. However, for the simple cases this is pretty simple, but as soon as we have a SCEV RecAddexpr, things get more complicated pretty fast. For example, if the pattern matched BTC instruction is described with: {(-1 + (sext i16 %N to i32)),+,-1}<nw><%for.body> and the BTC of the vectorised loop with a factor of 4 with: ((-4 + (4 * ({(3 + (sext i16 %N to i32))<nsw>,+,-1}<%for.body> /u 4))<nuw>) /u 4) Then extracting %N from both of these expressions and comparing this involves writing a mini scev visitor which I am a bit reluctant to do, may not be so generic, and I was hoping that the checks already performed are good enough smoke tests.... |
This is a (partial) rewrite of the patch after we changed the semantics of get.active.lane.mask to accept the loop tripcount as its second argument, and not the backedge-taken count. This now implements several checks to see if the tripcount belongs to this loop.
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
609 | There was a use for this in the previous version of this patch, to reuse some IR, but it's not necessary anymore in this version, so has been removed. |
I think this okay, certainly for Unknown and SCEVAddRec, and I wouldn't be up for having a big pattern searching again to completely double check that everything is as we expect. Maybe something will crop up that requires that, but at least this is a good start.
Thanks for that, and agreed with your remarks. I think this is already a bit more generic/flexible and thus better than what we had, but certainly isn't fully generic. I am willing to review this once that becomes important. Then, this logic has to be moved to Scalarevolution and be made generic.
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | Why do we need this check? Emitting vctp32 should be okay even if we can't actually tail-predicate the loop. The overflow check should be enough to ensure that's it's safe to emit vctp32, I think? Or am I forgetting somthing? |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | I could have a look where exactly, but if I am not mistaken you suggested on of the previous patches that we need to check that this tripcount/elementcount actually belongs to this loop. similarly like we already did for the IV. The reasoning was that for now get.active.lane.mask is emitted from the vector for nicely behaving loops, but it wouldn't be difficult to imagine that soon we will have a corresponding user-facing intrinsic. I think I am quoting that, if I remember well, and so these checks are needed. And if we emit the VCTP, then that represents tail-predication. I.e., the VCTP intrinsic can be picked up in the LoweroverheadLoop pass and turned into a tail-predicated loop (after additional checks). |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | I did have a look because I was curious if I had starting imaging things. This is the remark I had in mind: https://reviews.llvm.org/D79175#2063586 This is remark is explicitly about "L" though. And I thought there was a similar remark about the 2nd argument when it still was the BTC (previous version of this patch), but I don't think I can't find that easily now. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | Let me try to reformulate in a different way that might make it easier to understand. What you're doing here has two essential steps:
In theory, you could split these steps; (1) could be legal even without (2). Step 1 doesn't depend on the loop, or even that the statement is in a loop at all. The only requirement is that the subtraction itself doesn't overflow. Step 2 requires that "Y - X" is equivalent to the new induction variable: it's needs to be an AddRec for the loop you're inserting the PHI into, and the generated PHI has to have the same base and increment. Neither of these are should be directly connected to the trip count of the loop, I think. The way the code is currently written, I think you're trying to prove more than you actually need to. If the induction variable has the "wrong" base or increment, ARMLowOverheadLoops will ultimately fail to tail-predicate, but I'm not sure that's actually a problem. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | Thanks again Eli for explaining/elaborating. Let me know what you prefer or think is best: rip this particular bit out (revert it), or leave it for the moment. I am asking because I will need some time to have a look at this:
This "Y - X" expression is a difficult one to analyse (it can be), and I need to see how to do that. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 |
Post the patches in whatever order you think makes sense for review.
Step 2 should be easy. In the cases you're interested in, the "Y - X" SCEV expression should look something like {ElementCount,+,-VectorWidth}. VectorWidth is a constant, and you don't really need to analyze ElementCount. |
Sorry, I wrote a reply end of last week, but apparently forgot to push submit. So please see my reply inline, but I will open a new review soon, where it's probably best to continue this discussion and my reply.
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | I think I got a much better understanding of your suggestions now while I tried out a few things, but that's what I wanted to double check. (-4 + %N) At this point in he code here, we are not yet transforming the IR, but what we will generate is: vector.body: %7 = phi i32 [ %N, %vecItor.ph ], [ %9, %vector.body ] %9 = sub i32 %7, 4 br If I understand things correctly, you would like to sanity check that SCEV expression (-4 + %N) matches this IR, and thus that Phi %7 is a nice AddRec, which I think it is by defintion? I am not entirely sure what the added value would be of this check. Feels like that could be for example be an assert somewhere, and perhaps it is easier to do this in ARMLowOverheadLoops and not here as we don't have the transformed IR here.
I am kind of back to where I was before, and thinking that the current check makes some sense, but again I am of course perfectly happy to rip it out if we don't need it and let it be ARMLowOverheadLoops problem which indeed will probably not even trigger. |
llvm/lib/Target/ARM/MVETailPredication.cpp | ||
---|---|---|
376 | The PHI check I was describing is essentially the existing if (VectorWidth == StepValue) check. |
Why do we need this check? Emitting vctp32 should be okay even if we can't actually tail-predicate the loop. The overflow check should be enough to ensure that's it's safe to emit vctp32, I think? Or am I forgetting somthing?