This is an archive of the discontinued LLVM Phabricator instance.

[LV] Skip VFs < iterations remaining for epilogue vectorization.
ClosedPublic

Authored by fhahn on Jun 30 2023, 2:31 PM.

Details

Summary

If a candidate VF for epilogue vectorization is less than the number of
remaining iterations, the epilogue loop would be dead. Skip such factors.

Diff Detail

Event Timeline

fhahn created this revision.Jun 30 2023, 2:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 2:31 PM
fhahn requested review of this revision.Jun 30 2023, 2:31 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 30 2023, 2:31 PM
Ayal added inline comments.Jul 2 2023, 12:45 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5679

nit (unrelated): avoid else after return.

5709

nit: can set Type *TCType = Legal->getWidestInductionType(); and use it below.

5722

Worth early-continuing first if (a) !hasPlanWithVF, then if (b) NextVF >= EstimatedRuntimeVF, and last if (c) NextVF >= RemainingIterations?

Note that for IC==1 and non scalable VF's, check (c) subsumes check (b).

Can the checks below be simplified, given that EstimatedRuntimeVF = MainLoopVF if !MainLoopVF.isScalable()?
Instead of checking if Result.Width.isScalar() better check if Result is still uninitialized, i.e., == VectorizationFactor::Disabled()? Or teach isMoreProfitable() to prefer any computed cost over an uncomputed one.

llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll
866

nit: these UGLY-to-SCEV changes are unrelated.

llvm/test/Transforms/LoopVectorize/X86/limit-vf-by-tripcount.ll
7

This patch addresses this TODO?

llvm/test/Transforms/LoopVectorize/X86/pr42674.ll
8

Note that the change in generated code for this test appears to be indirectly related to fixing the VF selected for the epilog loop.
This loop is vectorized with VF=64, UF=2 both w/ and w/o the patch, yet w/o the patch its epilog loop is also vectorized (with VF=32), which is cleaned up by instcombine and simplifycfg, but the main loop itself remains. Now the epilog loop is not vectorized and the main single-vectorized-iteration loop gets cleaned up.

fhahn updated this revision to Diff 536903.Jul 3 2023, 2:46 PM
fhahn marked 3 inline comments as done.

Address latest comments, thanks!

Ayal added inline comments.Jul 4 2023, 5:45 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5707

While we're here, should EstimatedRuntimeVF (or rather, EstimatedMainLoopVF) be used for both scalable and non-scalable MainLoopVF below?
Are all 4 combinations of main loop and epilog loop VF's being scalable or non-scalable, possible?
Can the filtering of VF's larger than EstimatedMainLoopVF be simplified into a single ElementCount::isKnownGE(NextVF.Width, EstimatedMainLoopVF) check for all combinations?
Is the SCEV RemainingIterations check relevant only for the {non-scalable & non-scalable} combination, or worth leaving a TODO?

5709

nit: reverse the condition to use ICMP_UGE as done above?

fhahn updated this revision to Diff 537116.Jul 4 2023, 8:58 AM

Use UGT instead of ULT as suggeted, thanks!

fhahn updated this revision to Diff 537117.Jul 4 2023, 9:01 AM

Add TODO for scalable vectors.

fhahn marked 2 inline comments as done.Jul 4 2023, 9:07 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5679

Will adjust separately.

5707

While we're here, should EstimatedRuntimeVF (or rather, EstimatedMainLoopVF) be used for both scalable and non-scalable MainLoopVF below?
Are all 4 combinations of main loop and epilog loop VF's being scalable or non-scalable, possible?
Can the filtering of VF's larger than EstimatedMainLoopVF be simplified into a single ElementCount::isKnownGE(NextVF.Width, EstimatedMainLoopVF) check for all combinations?

In theory I think yes, all 4 combinations are possible

I tried just checking EstimatedRuntimeVF and not MainVF but there are some SVE tests that are failing.

Is the SCEV RemainingIterations check relevant only for the {non-scalable & non-scalable} combination, or worth leaving a TODO?

I added a TODO. I think for scalable vectors we would also need to estimate the runtime VFs and check those.

5709

Done, thanks!

5709

updated, thanks!

5722

Applied the first refactoring in b4efc0f070ba, update this patch, thanks! Will do the refactorings mentioned in the last paragraph separately.

Ayal accepted this revision.Jul 4 2023, 12:05 PM

This looks good to me, comment still needs to be fixed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5709

nit: comment still needs to be fixed.

This revision is now accepted and ready to land.Jul 4 2023, 12:05 PM
Matt added a subscriber: Matt.Jul 6 2023, 2:14 PM
This revision was landed with ongoing or failed builds.Jul 7 2023, 12:34 PM
This revision was automatically updated to reflect the committed changes.
fhahn marked 2 inline comments as done.