Page MenuHomePhabricator

ARM][MVE] tail-predication: overflow checks for backedge taken count

Authored by SjoerdMeijer on Aug 11 2020, 8:19 AM.



This pick ups the work on the overflow checks for, which ensure that it is safe to insert the VCTP intrinisc that enables tail-predication. For a 2d auto-correlation kernel and its inner loop j:

M = Size - i;
for (j = 0; j < M; j++)
  Sum += Input[j] * Input[j+i];

For this inner loop, the SCEV backedge taken count (BTC) expression is:

(-1 + (sext i16 %Size to i32)),+,-1}<nw><%for.body>

and LoopUtil cannotBeMaxInLoop couldn't calculate a bound on this, thus "BTC cannot be max" could not be determined. So overflow behaviour had to be assumed in the loop tripcount expression that uses the BTC. As a result tail-predication had to be forced (with an option) for this case.

This change solves that by using ScalarEvolution's helper getConstantMaxBackedgeTakenCount which is able to determine the range of BTC, thus can determine it is safe, so that we no longer need to force tail-predication as reflected in the changed test cases.

Diff Detail

Event Timeline

SjoerdMeijer created this revision.Aug 11 2020, 8:19 AM
SjoerdMeijer requested review of this revision.Aug 11 2020, 8:19 AM
efriedma accepted this revision.Aug 11 2020, 1:53 PM

It's nice that we have some utility that computes what you need, even if it's not the obvious one.


I'm forgetting, do we have a check somewhere that BackedgeTakenCount is actually the backedge-taken count of the loop?



This revision is now accepted and ready to land.Aug 11 2020, 1:53 PM
SjoerdMeijer added a comment.EditedAug 12 2020, 1:02 AM

Thanks for reviewing!

It's nice that we have some utility that computes what you need, even if it's not the obvious one.

Yes, that's why I was not entirely happy. Think I have seen a few non-obvious things in the scev helpers, and a bug too hidden by other things, but one step at a time here...


Good point. I thought we had one, but we have similar checks for the IV, which is done in step 3) below on line 464. Will address this in a follow up.

efriedma added inline comments.Aug 17 2020, 1:21 PM

I think I messed up reviewing this. Took me a bit of time to remember what's going on here.

SE->getConstantMaxBackedgeTakenCount() is the backedge-taken count of the loop. BTC is the number of elements the loop processes, minus one. If you want to ensure BTC + 1 doesn't overflow, getConstantMaxBackedgeTakenCount() doesn't actually help, unless you prove some connection between the two values. The code currently in IsSafeActiveMask does not try to prove that connection.

efriedma added inline comments.Aug 17 2020, 1:38 PM

On a higher-level note, I think this work has shown that trying to force this to work is really complicated, and it might make sense to change the vectorizer to generate something that's easier to analyze.

SjoerdMeijer added inline comments.Aug 18 2020, 2:38 AM

Ah, sorry about this. Don't know what I was thinking...somehow had myself convinced, but it's obviously not really what we need.

I fully agree about your high-level remark. This whole exercise has shown the limitations of our current approach, i.e. the complex analysis required, which we can hopefully avoid by changing the intrinsic/semantics. So, I am going to pursue that direction, and will soon propose something for this.