Page MenuHomePhabricator

[LV] Use SCEV to check if the trip count <= VF * UF.
Needs ReviewPublic

Authored by fhahn on Aug 31 2022, 6:05 AM.



Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once.

It requires using non-recursive SCEV reasoning, as at this stage the CFG
is incomplete and only reasoning based on existing expression can be
used. In particular, more complex strategies like proving via induction
cannot be used, as LI/DT may be out of date.

Alternatively the result of the check could be computed earlier.

Diff Detail

Event Timeline

fhahn created this revision.Aug 31 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM
fhahn requested review of this revision.Aug 31 2022, 6:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM
reames added inline comments.Aug 31 2022, 1:29 PM
608 ↗(On Diff #456943)

Please don't change the SCEV interface for this. Just use isKnownPredicate. You don't care if the reasoning is recursive or not.

fhahn added inline comments.Sep 1 2022, 2:24 PM
608 ↗(On Diff #456943)

The issue here is that we would need to limit the reasoning to methods that don’t try to look at the current IR, just at the SCEV expression, because the IR is in incomplete/partially modified state.

Using non-recursive reasoning is a proxy for that, but I’m not sure if it provides the required guarantees in all cases.

The alternative would be to compute the property before we start modifying the CFG, like we do for trip count expansion.

Ayal added inline comments.Sep 6 2022, 2:47 AM
608 ↗(On Diff #456943)

Analyzing and optimizing VPlan should ideally take place as VPlan2VPlan and reflected in VPlan itself before starting/preparing to execute it - at which time the IR is in incomplete/partially modified state. This case of optimizing away the latch branch for single iteration loops depends on UF which VPlan is currently agnostic to and encounters only when executing. How about cloning VPlans (also) according to which UF*VF's result in a single iteration (upto reasonable UF values - computeMaxUF()?), updating the default "UF>=1" in VPlan names and extending getBestPlanFor() to pass both VF and UF? That may admittedly require splitting ranges currently serving multiple VF's.

fhahn updated this revision to Diff 464511.Oct 1 2022, 12:47 PM

Rebased on top of D135017, which contains the main refactoring to prepare for this change, which now just changes to use ScalarEvolution::isKnowPredicate.

fhahn marked 2 inline comments as done.Oct 1 2022, 12:50 PM
fhahn added inline comments.
608 ↗(On Diff #456943)

I put up D135017 which is a step in between: perform the simplification as VP2VP transform that's VF & UF specific. This means we simplify before IR modifications and ScalarEvolution can be used without worrying about querying while the IR is in an incomplete state.

Ayal added inline comments.Oct 2 2022, 7:52 AM

Could this be simplified using SE.getSmallConstantMaxTripCount(L)? Or a caching thereof.