This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Introduce trip count minimal value threshold to ignore tail-folding for scalable vectors
ClosedPublic

Authored by dtemirbulatov on Jul 29 2022, 3:38 AM.

Details

Summary

After D121595 was commited, I noticed regressions assosicated with small trip count numbers vectorization by tail folding with scalable vectors. As a solution for those issues I propose to introduce the minimal trip count threshold value.

Diff Detail

Event Timeline

dtemirbulatov created this revision.Jul 29 2022, 3:38 AM
dtemirbulatov requested review of this revision.Jul 29 2022, 3:38 AM
paulwalker-arm added inline comments.Jul 29 2022, 5:52 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192

Typo here, should be TailFolding?

dtemirbulatov marked an inline comment as done.Jul 29 2022, 6:42 AM
dtemirbulatov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192

Done. thanks Paul.

paulwalker-arm added inline comments.Aug 4 2022, 4:27 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
192–195

We're late into the LLVM 15 release cycle so here we're trying to fix an LLVM 14 regression without resorting to reversing the decision to use tail folding for all low trip count loops.

I think it'll be better to now do this in a way with the most minimal impact to other targets and so rather than a global command line option can you instead add a TTI hook. That way we can restrict the change to just AArch64/SVE and if it turns out we can rely on a more accurate cost model in the future and there are no other uses at that time, we can remove the TTI hook.

5078–5082

This doesn't look like the correct place to decide this. Within LoopVectorizePass::processLoop the code exists to choose CM_ScalarEpilogueNotAllowedLowTripLoop for low trip count looks. I think that's the code we want to change to also consider the new tail folding specific watermark.

Matt added a subscriber: Matt.Aug 4 2022, 8:34 AM
dtemirbulatov marked an inline comment as done.

Fixed remarks.

paulwalker-arm added inline comments.Aug 5 2022, 8:15 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10119–10122

Can this be

if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold())
  SEL = CM_ScalarEpilogueNotAllowedLowTripLoop

and still do what you need?

dtemirbulatov marked an inline comment as done.Aug 5 2022, 9:50 AM
dtemirbulatov added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10119–10122

I have to return false in order to prevent vectorization.

paulwalker-arm added inline comments.Aug 7 2022, 1:39 PM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10119–10122

Thanks @dtemirbulatov. In which case it looks like you need to restructure the code a little so we emit the necessary remark and debug information. e.g.

if (Hints.getForce() == LoopVectorizeHints::FK_Enabled)
...
else if (*ExpectedTC > TTI->getMinTripCountTailFoldingThreshold()) {
  LLVM_DEBUG(dbgs() << "\n");
  SEL = CM_ScalarEpilogueNotAllowedLowTripLoop;
} else {
  LLVM_DEBUG(dbgs() << " But the target considers the trip count too small to consider vectorizing.\n");
  reportVectorizationFailure(....);
  Hints.emitRemarkWithHints();
  return false;
}
dtemirbulatov marked an inline comment as done.

Updating proposed change according to @paulwalker-arm remarks.

paulwalker-arm accepted this revision.Aug 8 2022, 9:15 AM

A minor pre-push change but otherwise looks good. Once landed can you please do the necessary to request an LLVM 15 backport and then we can turn our attention to potentially fixing up the cost model so this rather brute force fix can be removed.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
10123

To make the output consistent please add a <space> before the 'But' because otherwise you'll end up printing overheads are incurred.But the target considers...

This revision is now accepted and ready to land.Aug 8 2022, 9:15 AM
dtemirbulatov closed this revision.Aug 10 2022, 5:28 AM

this was committed with cab6cd68340255be241b7cf169c67a1899ced115
but I made extra spaces before "Differential Revision", that is why it didn't close automatically.