This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Take vscale into account when deciding to create epilogues
ClosedPublic

Authored by david-arm on Apr 4 2023, 5:28 AM.

Details

Summary

In LoopVectorizationCostModel::isEpilogueVectorizationProfitable we
check to see if the chosen main vector loop VF >= 16. If so, we
decide to create a vector epilogue loop. However, this doesn't
take VScaleForTuning into account because we could be targeting a
CPU where vscale > 1, and hence the runtime VF would be a multiple
of the known minimum value.

This patch multiplies scalable VFs by VScaleForTuning and several
tests have been updated that now produce vector epilogues.

Diff Detail

Event Timeline

david-arm created this revision.Apr 4 2023, 5:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 4 2023, 5:28 AM
david-arm requested review of this revision.Apr 4 2023, 5:28 AM
fhahn added a subscriber: fhahn.Apr 4 2023, 9:45 AM

It looks like intrinsiccost.ll is failing in the precommit tests?

It looks like intrinsiccost.ll is failing in the precommit tests?

Well spotted @fhahn, thanks! Not sure what happened there as I thought I'd run make check. Oh well!

david-arm updated this revision to Diff 511013.Apr 5 2023, 1:37 AM
  • Reverted test changes for intrinsiccost.ll.

I'm worried the test changes don't look relevant to the code change. I mean, sure the changes are are the effect of the code change, but the tests themselves are not related to epilogue vectorisation? nor do they specify what to tune for and thus they've only changed because of the current default for this tuning option. This means the tests will change if/when the default changes. I think it would be better to have a dedicated test file that shows the result of a simple loop when RUN using several cpu tuning parameters. And for the other/existing loop vectorisation tests to be independent of the effect of tuning when the effect is not relevant to what the test is protecting (which is how you're already handling runtime-check-size-based-threshold.ll).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5625–5626

You could use Multiplier = getVScaleForTuning().value_or(1) here if you wanted. At some point in the future we might want to change this to value_or(MinVScale) anyway.

david-arm updated this revision to Diff 512842.Apr 12 2023, 8:13 AM
  • Reverted test changes and added a new specific test for epilogue vectorisation with vscale tuning.
david-arm marked an inline comment as done.Apr 12 2023, 8:13 AM
paulwalker-arm accepted this revision.Apr 12 2023, 8:37 AM

I've a couple of suggestions but otherwise looks good.

llvm/test/Transforms/LoopVectorize/AArch64/sve-epilog-vect-vscale-tune.ll
4–5

Perhaps add a RUN line for the default (i.e. with no -mcpu option) that can presumably reuse the check lines for CHECK-NV1.

52

It's up to you Dave but I don't see autogenerating the check lines offering any value here. If anything it makes it harder to understand the effect. Something simple like:

; CHECK-EPILOGUE:        vec.epilog.ph:
; CHECK-EPILOGUE:        load <vscale x 4 x i16>
; CHECK-NO-EPILOGUE-NOT: vec.epilog.ph:

seems like a clearer test?

This revision is now accepted and ready to land.Apr 12 2023, 8:37 AM
Matt added a subscriber: Matt.Apr 19 2023, 9:31 AM
dewen added a subscriber: dewen.Tue, Nov 14, 1:25 AM