This is an archive of the discontinued LLVM Phabricator instance.

[LV] Move exit cond simplification to separate transform.
ClosedPublic

Authored by fhahn on Oct 1 2022, 12:42 PM.

Details

Summary

This sets the stage for D133017 by moving out the code that performs
VPlan based simplifications to a separate transform that takes the
chosen VF & UF as arguments.

The main advantage is that this transform runs before any changes to
the CFG are being made. This allows using SCEV without worrying about
making queries while the IR is in an incomplete state.

Note that this patch switches the reasoning to use SCEV, but still only
simplifies loops with constant trip counts. Using SCEV here is needed to
access the backedge taken count, because the trip count IR value has not
been created yet.

Diff Detail

Event Timeline

fhahn created this revision.Oct 1 2022, 12:42 PM
fhahn requested review of this revision.Oct 1 2022, 12:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 1 2022, 12:42 PM
Ayal added inline comments.Oct 2 2022, 7:32 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
464

TripCountV indeed hasn't been created yet, but perhaps there's a simpler way to check this condition, e.g., by checking if PSE.getBackedgeCount() is a constant greater than VF*UF-1, instead of incrementing it by 1 of type IdxTy; or trying to call SE.getSmallConstantTripCount(L); or recording it in VPlan, possibly in CanonicalIV recipe.

474

Worth an unconditional branch VPInstruction instead of the effort to generate a redundant True? Can be fixed independent of this patch, which is working a bit harder to generate it.

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
66

Missing documentation.

llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll
16

Is this a regression from previously folding the conditional branch?

Ayal added inline comments.Oct 2 2022, 7:40 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7623

nit: can drop "Concrete".

Restrict the VF range of BestPlan to include only BestVF?

When printing BestPlan would be good to now indicate that it applies to BestUF only.

fhahn updated this revision to Diff 483808.Dec 18 2022, 2:52 AM
fhahn marked 2 inline comments as done.

Address latest comments, thanks!

fhahn marked 3 inline comments as done.Dec 18 2022, 2:59 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7623

nit: can drop "Concrete".

Done!

Restrict the VF range of BestPlan to include only BestVF?

Done!

When printing BestPlan would be good to now indicate that it applies to BestUF only.

Should I add a field for that?

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
464

That's what I started out with, but the trip count (BTC + 1) allows additional simplifications in D133017 as BTC's often involve a -1, which means it may be negative limiting SCEV simplifications.

474

It should probably be sufficient to just set a single successor. We will have to also replace & remove the header phis.

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
66

added, thanks!

llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll
16

The test case is spurious, the vector body will never be executed, as the exit condition compares to undef.

Ayal accepted this revision.Dec 20 2022, 2:05 AM

Looks good to me, adding various nits.

In the summary, perhaps indicate that the optimization is moved earlier to take place before creating the skeleton, as soon as the UF is available.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7623

When printing BestPlan would be good to now indicate that it applies to BestUF only.

Should I add a field for that?

Probably, to then print it instead of "UF >= 1". Can also add setUF(), and an assert when executing (the epilog loop) that the plan used works for the current UF.

llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
454

So we retain "if TC <= VF * UF" ?

456

nit: can fold this comment and logic into "canSimplifyLoopBranch(Term)" (independent of this patch).

466

nit: ExitCount >> TripCount ?

nit: where is createTripCountSCEV defined? Should VPlan or its loop Region provide it, similar to providing the canonical IV, or even have the canonical IV recipe provide it? When optimizing the loop branch recipe, one can also optimize the canonical IV recipe...

469

nit: is the isZero() check needed?

474

OK, right, there should be no need for an unconditional branch VPInstruction, once CFG is straightened, as in the 2nd TODO.

llvm/lib/Transforms/Vectorize/VPlanTransforms.h
67

nit: may restrict >> restricts ?

llvm/test/Transforms/LoopVectorize/lcssa-crashes.ll
16

nit: worth fixing the test somehow, to clarify that no folding is lost here?

This revision is now accepted and ready to land.Dec 20 2022, 2:05 AM
fhahn updated this revision to Diff 485087.Dec 23 2022, 4:37 AM
fhahn marked 3 inline comments as done.

Rebase and address nits, thanks! I am planning on landing this shortly.

fhahn marked 4 inline comments as done.Dec 23 2022, 4:50 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7623

Thanks, I landed the following improvements separately:

  • move generating of VFs & UFs as part of getName to ensure it always prints the up-to-date ones: 96296922b61f
  • tracking of UF in VPlan: 5df34e971dd5
  • add asserts in executePlan that plan has selected VF + UF: b7b1e5c96fe0
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
454

Yes, updated!

466

nit: ExitCount >> TripCount ?

Fixed, thanks!

nit: where is createTripCountSCEV defined?

At the moment it is a regular function, as it is also used outside Plan. But once the scaffolding-generation is also handled by VPlan it could be moved there.

469

Yes for now I think; if it is zero then the +1 overflowed and this check is the same as in the previous version of the code.

This revision was landed with ongoing or failed builds.Dec 23 2022, 4:52 AM
This revision was automatically updated to reflect the committed changes.
fhahn marked 4 inline comments as done.
Ayal added inline comments.Dec 26 2022, 3:09 AM
llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
469

Right, this is what the previous version was doing, and zero stands for a wrapped-around maximal BTC. But CheckMinIters bails out in such (Max iteration) cases nonetheless, w/o reaching the latch? Probably deserves a comment and/or could be simplified into C->isZero().