This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Use overflow-check analysis to improve tail-folding.

Authored by sdesmalen on Jan 30 2023, 6:51 AM.



This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Diff Detail

Event Timeline

sdesmalen created this revision.Jan 30 2023, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 6:51 AM
sdesmalen requested review of this revision.Jan 30 2023, 6:51 AM
sdesmalen updated this revision to Diff 496151.Feb 9 2023, 9:06 AM

Rebased patch.

Nice! I had a few minor comments, except for a possible issue with the use of getSmallBestKnownTC.


Can you add some /// comments here please?


This is just a thought, but I think you can probably simplify this to just:

TailFoldingStyle Style = Cost->getTailFoldingStyle();
if (Style == TailFoldingStyle::None)
  CheckMinIters =
      Builder.CreateICmp(P, Count, CreateStep(), "min.iters.check");
else if (VF.isScalable() &&
           Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck &&
           !Cost->isIndvarOverflowCheckKnownFalse(VF, UF)) {

That way you only need to call isIndvarOverflowCheckKnownFalse just before you're about to actually create the checks. What do you think?


I wonder if this is safe because getSmallBestKnownTC also returns a value from profiling for the expected trip count, which is more like a hint rather than a definite?


nit: Maybe the entry blocks in both tests can be removed to simplify the IR and CHECK lines? I think the only thing that matters here is the zext in the for.body.preheader, right?

paulwalker-arm added inline comments.Feb 14 2023, 9:33 AM

It seems weird to ask the target for their preferred style and then immediate override it. Is there a reason not to pass IVUpdateCannotOverflow to getPreferredTailFoldingStyle?


Up to you but personally I think for all instances the reverse polarity IVUpdateCanOverflow reads better.


Forgive my ignorance here but this doesn't look like a cost model question?


num-elts-per-iteration or (VF * UF)?


This doesn't look entirely safe. I think you really want SE.getSmallConstantTripCount?


MaxVF? because Elts is implied.


We so need to get rid of TTI::getMaxVScale(). I think function attributes should take precedence but then this is the order used within getMaxLegalScalableVF so perhaps best fixed separately. Up to you.


I suppose this can overflow. Perhaps make MaxVFElts an uint64_t given that's what ugt will promote to anyway.


Is addCanonicalIVRecipes being passed the "wrong" tail folding style a functional issue? or just a corner case that might affect performance.

sdesmalen updated this revision to Diff 499552.Feb 22 2023, 9:27 AM
sdesmalen marked 12 inline comments as done.
  • IVUpdateCannotOverflow -> IVUpdateMayOverflow
  • Added 'IVUpdateMayOverflow' as operand to getPreferredTailFoldingStyle.
  • Runtime check is no longer emitted when tail-folding when we know the check evaluates to false.
sdesmalen added inline comments.Feb 22 2023, 9:27 AM

I'd rather fix that separately so that the order is consistent between getMaxLegalScalableVF and here.


This is not a functional issue. In the worst case it generates both the runtime check and computes the slightly more expensive runtime trip-count in the preheader.


When I change that, ScalarEvolution doesn't recognise the maximum number of iterations that way, and so it vectorizes with the different tail folding style.

sdesmalen updated this revision to Diff 499849.Feb 23 2023, 7:20 AM

Rebased to use getMaxVScale function introduced in rG9449deda12c4

paulwalker-arm accepted this revision.Feb 24 2023, 7:05 AM
This revision is now accepted and ready to land.Feb 24 2023, 7:05 AM
This revision was landed with ongoing or failed builds.Mar 1 2023, 6:18 AM
This revision was automatically updated to reflect the committed changes.