This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Use overflow-check analysis to improve tail-folding.
ClosedPublic

Authored by sdesmalen on Jan 30 2023, 6:51 AM.

Details

Summary

This work follows on from D142109 and addresses a possible regression
when we know the loop iteration counter cannot overflow.

When we know the overflow-check always evaluates to false, it's better to
use the other style of tail folding where it assumes a runtime check was
added, because that avoids having to calculate a modified trip-count.

Diff Detail

Event Timeline

sdesmalen created this revision.Jan 30 2023, 6:51 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 30 2023, 6:51 AM
sdesmalen requested review of this revision.Jan 30 2023, 6:51 AM
sdesmalen updated this revision to Diff 496151.Feb 9 2023, 9:06 AM

Rebased patch.

Nice! I had a few minor comments, except for a possible issue with the use of getSmallBestKnownTC.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1571

Can you add some /// comments here please?

3085

This is just a thought, but I think you can probably simplify this to just:

TailFoldingStyle Style = Cost->getTailFoldingStyle();
if (Style == TailFoldingStyle::None)
  CheckMinIters =
      Builder.CreateICmp(P, Count, CreateStep(), "min.iters.check");
else if (VF.isScalable() &&
           Style != TailFoldingStyle::DataAndControlFlowWithoutRuntimeCheck &&
           !Cost->isIndvarOverflowCheckKnownFalse(VF, UF)) {
  ...

That way you only need to call isIndvarOverflowCheckKnownFalse just before you're about to actually create the checks. What do you think?

3542

I wonder if this is safe because getSmallBestKnownTC also returns a value from profiling for the expected trip count, which is more like a hint rather than a definite?

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-overflow-checks.ll
46

nit: Maybe the entry blocks in both tests can be removed to simplify the IR and CHECK lines? I think the only thing that matters here is the zext in the for.body.preheader, right?

paulwalker-arm added inline comments.Feb 14 2023, 9:33 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
1580–1581

It seems weird to ask the target for their preferred style and then immediate override it. Is there a reason not to pass IVUpdateCannotOverflow to getPreferredTailFoldingStyle?

3084

Up to you but personally I think for all instances the reverse polarity IVUpdateCanOverflow reads better.

3528

Forgive my ignorance here but this doesn't look like a cost model question?

3540

num-elts-per-iteration or (VF * UF)?

3542

This doesn't look entirely safe. I think you really want SE.getSmallConstantTripCount?

3543

MaxVF? because Elts is implied.

3545

We so need to get rid of TTI::getMaxVScale(). I think function attributes should take precedence but then this is the order used within getMaxLegalScalableVF so perhaps best fixed separately. Up to you.

3548

I suppose this can overflow. Perhaps make MaxVFElts an uint64_t given that's what ugt will promote to anyway.

9028

Is addCanonicalIVRecipes being passed the "wrong" tail folding style a functional issue? or just a corner case that might affect performance.

sdesmalen updated this revision to Diff 499552.Feb 22 2023, 9:27 AM
sdesmalen marked 12 inline comments as done.
  • IVUpdateCannotOverflow -> IVUpdateMayOverflow
  • Added 'IVUpdateMayOverflow' as operand to getPreferredTailFoldingStyle.
  • Runtime check is no longer emitted when tail-folding when we know the check evaluates to false.
sdesmalen added inline comments.Feb 22 2023, 9:27 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
3545

I'd rather fix that separately so that the order is consistent between getMaxLegalScalableVF and here.

9028

This is not a functional issue. In the worst case it generates both the runtime check and computes the slightly more expensive runtime trip-count in the preheader.

llvm/test/Transforms/LoopVectorize/AArch64/sve-tail-folding-overflow-checks.ll
46

When I change that, ScalarEvolution doesn't recognise the maximum number of iterations that way, and so it vectorizes with the different tail folding style.

sdesmalen updated this revision to Diff 499849.Feb 23 2023, 7:20 AM

Rebased to use getMaxVScale function introduced in rG9449deda12c4

paulwalker-arm accepted this revision.Feb 24 2023, 7:05 AM
This revision is now accepted and ready to land.Feb 24 2023, 7:05 AM
This revision was landed with ongoing or failed builds.Mar 1 2023, 6:18 AM
This revision was automatically updated to reflect the committed changes.