This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use SCEV to check if the trip count <= VF * UF.
ClosedPublic

Authored by fhahn on Aug 31 2022, 6:05 AM.

Details

Summary

Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once.

It requires using non-recursive SCEV reasoning, as at this stage the CFG
is incomplete and only reasoning based on existing expression can be
used. In particular, more complex strategies like proving via induction
cannot be used, as LI/DT may be out of date.

Alternatively the result of the check could be computed earlier.

Diff Detail

Event Timeline

fhahn created this revision.Aug 31 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM
fhahn requested review of this revision.Aug 31 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM
reames added inline comments.Aug 31 2022, 1:29 PM
llvm/lib/Transforms/Vectorize/VPlan.cpp
608

Please don't change the SCEV interface for this. Just use isKnownPredicate. You don't care if the reasoning is recursive or not.

fhahn added inline comments.Sep 1 2022, 2:24 PM
llvm/lib/Transforms/Vectorize/VPlan.cpp
608

The issue here is that we would need to limit the reasoning to methods that don’t try to look at the current IR, just at the SCEV expression, because the IR is in incomplete/partially modified state.

Using non-recursive reasoning is a proxy for that, but I’m not sure if it provides the required guarantees in all cases.

The alternative would be to compute the property before we start modifying the CFG, like we do for trip count expansion.

Ayal added inline comments.Sep 6 2022, 2:47 AM
llvm/lib/Transforms/Vectorize/VPlan.cpp
608

Analyzing and optimizing VPlan should ideally take place as VPlan2VPlan and reflected in VPlan itself before starting/preparing to execute it - at which time the IR is in incomplete/partially modified state. This case of optimizing away the latch branch for single iteration loops depends on UF which VPlan is currently agnostic to and encounters only when executing. How about cloning VPlans (also) according to which UF*VF's result in a single iteration (upto reasonable UF values - computeMaxUF()?), updating the default "UF>=1" in VPlan names and extending getBestPlanFor() to pass both VF and UF? That may admittedly require splitting ranges currently serving multiple VF's.

fhahn updated this revision to Diff 464511.Oct 1 2022, 12:47 PM

Rebased on top of D135017, which contains the main refactoring to prepare for this change, which now just changes to use ScalarEvolution::isKnowPredicate.

fhahn marked 2 inline comments as done.Oct 1 2022, 12:50 PM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlan.cpp
608

I put up D135017 which is a step in between: perform the simplification as VP2VP transform that's VF & UF specific. This means we simplify before IR modifications and ScalarEvolution can be used without worrying about querying while the IR is in an incomplete state.

Ayal added inline comments.Oct 2 2022, 7:52 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
469 ↗(On Diff #464511)

Could this be simplified using SE.getSmallConstantMaxTripCount(L)? Or a caching thereof.

fhahn marked 2 inline comments as done.Dec 18 2022, 4:16 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
469 ↗(On Diff #464511)

SE.getSmallConstantMaxTripCount(L) only supports the case where the trip count is a constant, while isKnownPredicate supports the non-constant case as well.

fhahn updated this revision to Diff 483809.Dec 18 2022, 4:20 AM
fhahn marked an inline comment as done.

rebase after update to D135017

Ayal accepted this revision.Dec 20 2022, 5:26 AM

Looks good to me, thanks.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
483 ↗(On Diff #483809)

Thought: create vector trip count SCEV, once VF and UF are set; use it to create vector trip count Value, and also to check if it is known to be less than or equal to 1?

469 ↗(On Diff #464511)

Ok. Non-constant case seems a bit extreme, unless tail is folded - vectorizing and unrolling to a known upper bound will otherwise run the original scalar loop whenever its trip count misses the exact upper bound.

This revision is now accepted and ready to land.Dec 20 2022, 5:26 AM
fhahn updated this revision to Diff 485193.Dec 24 2022, 3:25 AM

rebase after recent changes

This revision was landed with ongoing or failed builds.Dec 24 2022, 10:35 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.Dec 24 2022, 10:39 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
483 ↗(On Diff #483809)

Will do as follow-up! Would be good to integrate it once the loop skeleton is also created in VPlan.

469 ↗(On Diff #464511)

Yes, the transform just removes an unneeded branch; deciding whether it is profitable to vectorize with scalar tail will still be done elsewhere.