This is an archive of the discontinued LLVM Phabricator instance.

[ARM][MVE] tail-predication: checks for the elementcount, cont'd
ClosedPublic

Authored by SjoerdMeijer on Sep 22 2020, 3:56 AM.

Details

Summary

In D86074, we discussed sanity checks for intrinsic get.active.lane.mask's second argument, the element count. @efriedma summarised our steps as:

What you're doing here has two essential steps:

  • Convert "llvm.get.active.lane.mask(X, Y)" to "llvm.arm.mve.vctp(Y - X)".
  • Convert "Y - X" to a simpler induction variable.

and also that:

The way the code is currently written, I think you're trying to prove more than you actually need to. If the induction variable has the "wrong" base or increment, ARMLowOverheadLoops will ultimately fail to tail-predicate, but I'm not sure that's actually a problem."

That's why I propose to just skip that check here, which is why I have deleted it. I have kept the check when the values are constants, because those checks

are straightforward.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Sep 22 2020, 3:56 AM
SjoerdMeijer requested review of this revision.Sep 22 2020, 3:56 AM
efriedma added inline comments.Sep 22 2020, 11:51 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
405

Maybe add a comment clarifying why we want to bail out here?

464–465

Should this be using EC, not TC?

464–465

I'm not confident this is actually proving what you want to prove? If x is in the range [0, 10), and y is in the range [0, 12), that doesn't prove x<y.

487

Do we need to check the first operand of the AddRec is zero?

Thanks for looking Eli.

I have addressed the minor comment.

Regarding your comments about the overflow checks: you are absolutely right about that these checks were not entirely right yet, which we discussed in a previous patch. I forgot to add a FIXME to the relevant code. What I have decided to do here is to just completely remove both checks 2.1 and 2.2 and put FIXMEs in there. Check 2.1 looking at it again, doesn't make sense, so don't see the point of keeping it. Check 2.2, needs improvements, but then I decided to also get rid of it because it needs a rewrite (and so I don't see the point of keeping it). As we are save for now, the intrinsic is not exposed to the user yet, I thought to do the clean up here first, and address the improvements in a follow up.

To give an impression, I have briefly looked at the overflow checks again that I have removed here. What I am thinking about for the follow up is to approach the overflow checks in a different way because I just don't see how SCEV and integer ranges or the "is always positive" helpers are going to help me. My observation is that for most loops that we want to support, we have these patterns:

tripcount: (1 + ((-4 + (4 * ((3 + %N) /u 4))<nuw>) /u 4))<nuw><nsw>
elemcount: %N

or:

tripcount: (1 + ((-4 + (4 * ({(3 + (sext i16 %Size to i32)),+,-1}<nw><%for.body> /u 4))<nuw>) /u 4))<nuw><nsw>
elemcount: {(sext i16 %Size to i32),+,-1}<nw><%for.body>

and write a SCEV visitor/matcher, and see if the elementcount is "part of the tripcount", i.e. if they are bound by the same variable modulo the rounding up, then we have a very strong indication that we are talking about a well behaving loop without any chance of overflow. So in a way it is similar to the code removed here in Check 1 but used for another purpose I guess.

efriedma accepted this revision.Sep 23 2020, 11:52 AM

If you'd prefer to delete the overflow code first, sure. LGTM

tripcount: (1 + ((-4 + (4 * ((3 + %N) /u 4))<nuw>) /u 4))<nuw><nsw>

Is this really the expression we end up with? It would be a lot simpler to analyze this if we could simplify to (3 + %N) /u 4).

and write a SCEV visitor/matcher, and see if the elementcount is "part of the tripcount", i.e. if they are bound by the same variable modulo the rounding up, then we have a very strong indication that we are talking about a well behaving loop without any chance of overflow. So in a way it is similar to the code removed here in Check 1 but used for another purpose I guess.

Isn't what you're describing part of check 1 (avoiding vctp in a loop that we probably can't tail-predicate)? Not sure it helps with proving the correctness?

Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.

This revision is now accepted and ready to land.Sep 23 2020, 11:52 AM

Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.

This sounds like the same suggestion that I made many moons ago... I suggested taking these values and substituting them into the expected SCEV expression, and then perform some SCEV algebra on it and the vector TC expression, until hopefully they both just equal ElementCount == ElementCount. My quick prototype 'worked', but I don't know if that says much.

Actually, I guess if you could prove that the tripcount is precisely equal to (ElementCount + VectorWidth - 1)/VectorWidth, you could also use that to prove the subtraction doesn't overflow.

This sounds like the same suggestion that I made many moons ago... I suggested taking these values and substituting them into the expected SCEV expression, and then perform some SCEV algebra on it and the vector TC expression, until hopefully they both just equal ElementCount == ElementCount. My quick prototype 'worked', but I don't know if that says much.

Sorry about that. I don't know if I missed that, or forgot about it.... but we have tried many things, and learned a few things. :-) But I am very happy with this as the way forward.

I wanted to write the new checks in a separate patch as I thought it would be a new lump of code, wanted to get this clean up first out of the way, but since our last idea it is probably best to continue here. I.e., the TC == (ElemCount+VW-1) / VW is hopefully just a minor addition.

efriedma added inline comments.Sep 24 2020, 11:37 AM
llvm/lib/Target/ARM/MVETailPredication.cpp
459

I'm not really comfortable using hasOperand here.

Can we actually do something like SE->getBackedgeTakenCount(L) == S->getUDiv(SE->getAdd(SE->getMul(Ceil, VectorWidth), SE->getNeg(VectorWidth)), VectorWidth)? Or is that not reliable enough?

487

Any further thought on this?

SjoerdMeijer added inline comments.Sep 24 2020, 12:12 PM
llvm/lib/Target/ARM/MVETailPredication.cpp
459

Yeah, I was trying to avoid that bit of pattern matching (and hasOperand was so convenient), but will give that a try tomorrow.

487

Sorry, forgot about this one, will also address this.

I am so happy that this approach works! I.e., this determines equality of TC and ElemenCount by calculating 2 scev expressions and subtracting them and testing the result for 0. Also a check for the base of the AddRec has been added now, so I think this addresses all comments.

efriedma accepted this revision.Sep 25 2020, 1:08 PM

LGTM!

Many thanks @efriedma and @samparker for your help with this work.

llvm/test/CodeGen/Thumb2/LowOverheadLoops/tail-pred-basic.ll