Download Raw Diff

Details

Reviewers

Ayal
reames
dmgreen
gilr

Commits

rG9758242046b3: [LV] Use SCEV to check if the trip count <= VF * UF.

Summary

Just comparing constant trip counts causes LV to miss cases where the
vector loop body only executes once.

The motivation for this is to remove the need for unrolling to remove
vector loop back-edges, if the body only executes once.

It requires using non-recursive SCEV reasoning, as at this stage the CFG
is incomplete and only reasoning based on existing expression can be
used. In particular, more complex strategies like proving via induction
cannot be used, as LI/DT may be out of date.

Alternatively the result of the check could be computed earlier.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Aug 31 2022, 6:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM

Herald added subscribers: rogfer01, javed.absar, hiraditya. · View Herald Transcript

fhahn requested review of this revision.Aug 31 2022, 6:05 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2022, 6:05 AM

Herald added subscribers: • pcwang-thead, vkmr. · View Herald Transcript

fhahn mentioned this in D115261: [LV] Disable runtime unrolling for vectorized loops..Aug 31 2022, 6:13 AM

Harbormaster completed remote builds in B184351: Diff 456943.Aug 31 2022, 6:48 AM

reames added inline comments.Aug 31 2022, 1:29 PM

llvm/lib/Transforms/Vectorize/VPlan.cpp
608 ↗	(On Diff #456943)	Please don't change the SCEV interface for this. Just use isKnownPredicate. You don't care if the reasoning is recursive or not.

fhahn added inline comments.Sep 1 2022, 2:24 PM

llvm/lib/Transforms/Vectorize/VPlan.cpp
608 ↗	(On Diff #456943)	The issue here is that we would need to limit the reasoning to methods that don’t try to look at the current IR, just at the SCEV expression, because the IR is in incomplete/partially modified state. Using non-recursive reasoning is a proxy for that, but I’m not sure if it provides the required guarantees in all cases. The alternative would be to compute the property before we start modifying the CFG, like we do for trip count expansion.

Ayal added inline comments.Sep 6 2022, 2:47 AM

llvm/lib/Transforms/Vectorize/VPlan.cpp
608 ↗	(On Diff #456943)	Analyzing and optimizing VPlan should ideally take place as VPlan2VPlan and reflected in VPlan itself before starting/preparing to execute it - at which time the IR is in incomplete/partially modified state. This case of optimizing away the latch branch for single iteration loops depends on UF which VPlan is currently agnostic to and encounters only when executing. How about cloning VPlans (also) according to which UF*VF's result in a single iteration (upto reasonable UF values - computeMaxUF()?), updating the default "UF>=1" in VPlan names and extending getBestPlanFor() to pass both VF and UF? That may admittedly require splitting ranges currently serving multiple VF's.

fhahn mentioned this in D135017: [LV] Move exit cond simplification to separate transform..Oct 1 2022, 12:42 PM

Rebased on top of D135017, which contains the main refactoring to prepare for this change, which now just changes to use ScalarEvolution::isKnowPredicate.

Harbormaster completed remote builds in B189858: Diff 464511.Oct 1 2022, 12:47 PM

fhahn marked 2 inline comments as done.Oct 1 2022, 12:50 PM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/VPlan.cpp
608 ↗	(On Diff #456943)	I put up D135017 which is a step in between: perform the simplification as VP2VP transform that's VF & UF specific. This means we simplify before IR modifications and ScalarEvolution can be used without worrying about querying while the IR is in an incomplete state.

Ayal added inline comments.Oct 2 2022, 7:52 AM

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
488	Could this be simplified using SE.getSmallConstantMaxTripCount(L)? Or a caching thereof.

fhahn added a parent revision: D135017: [LV] Move exit cond simplification to separate transform..Dec 18 2022, 2:54 AM

fhahn marked 2 inline comments as done.Dec 18 2022, 4:16 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
488	`SE.getSmallConstantMaxTripCount(L)` only supports the case where the trip count is a constant, while `isKnownPredicate` supports the non-constant case as well.

rebase after update to D135017

Harbormaster completed remote builds in B203798: Diff 483809.Dec 18 2022, 4:34 AM

Looks good to me, thanks.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
483–485	Thought: create vector trip count SCEV, once VF and UF are set; use it to create vector trip count Value, and also to check if it is known to be less than or equal to 1?
488	Ok. Non-constant case seems a bit extreme, unless tail is folded - vectorizing and unrolling to a known upper bound will otherwise run the original scalar loop whenever its trip count misses the exact upper bound.

This revision is now accepted and ready to land.Dec 20 2022, 5:26 AM

fhahn mentioned this in rGe1650c8d5291: [LV] Move exit cond simplification to separate transform..Dec 23 2022, 4:52 AM

rebase after recent changes

Harbormaster completed remote builds in B204843: Diff 485193.Dec 24 2022, 3:36 AM

This revision was landed with ongoing or failed builds.Dec 24 2022, 10:35 AM

Closed by commit rG9758242046b3: [LV] Use SCEV to check if the trip count <= VF * UF. (authored by fhahn). · Explain Why

This revision was automatically updated to reflect the committed changes.

fhahn added a commit: rG9758242046b3: [LV] Use SCEV to check if the trip count <= VF * UF..

fhahn marked 2 inline comments as done.Dec 24 2022, 10:39 AM

fhahn added inline comments.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
483–485	Will do as follow-up! Would be good to integrate it once the loop skeleton is also created in VPlan.
488	Yes, the transform just removes an unneeded branch; deciding whether it is profitable to vectorize with scalar tail will still be done elsewhere.

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use SCEV to check if the trip count <= VF * UF.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 485205

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination.ll

This is an archive of the discontinued LLVM Phabricator instance.

[LV] Use SCEV to check if the trip count <= VF * UF.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 485205

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp

llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination.ll

[LV] Use SCEV to check if the trip count <= VF * UF.
ClosedPublic