This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Add support for scalable vectorization of induction variables
ClosedPublic

Authored by david-arm on Mar 16 2021, 8:45 AM.

Details

Summary

This patch adds support for the vectorization of induction variables when
using scalable vectors, which required the following changes:

  1. Removed assert from InnerLoopVectorizer::getStepVector.
  2. Modified InnerLoopVectorizer::createVectorIntOrFpInductionPHI to use a runtime determined value for VF and removed an assert.
  3. Modified InnerLoopVectorizer::buildScalarSteps to work for scalable vectors. I did this by calculating the full vector value for each Part of the unroll factor (UF) and caching this in the VP state. This means that we are always able to extract an arbitrary element from the vector if necessary. In addition to this, I also permitted the caching of the individual lane values themselves for the known minimum number of elements in the same way we do for fixed width vectors. This is a further optimisation that improves the code quality since it avoids unnecessary extractelement operations when extracting the first lane.
  4. Added an assert to InnerLoopVectorizer::widenPHIInstruction, since while testing some code paths I noticed this is currently broken for scalable vectors.

Various tests to support different cases have been added here:

Transforms/LoopVectorize/AArch64/sve-inductions.ll

Diff Detail

Event Timeline

david-arm created this revision.Mar 16 2021, 8:45 AM
david-arm requested review of this revision.Mar 16 2021, 8:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 16 2021, 8:45 AM
ctetreau added inline comments.Mar 16 2021, 9:23 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2536

what happens in this case now? I see a bunch of branches that handle the case where Lanes is greater than 1, but nothing for when it equals 1.

david-arm added inline comments.Mar 16 2021, 9:27 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2536

When this equals 1 due to being uniform the loop below works fine I think - we just don't bother creating a whole vector for each Part. We only create the first lane instead, which works for both fixed-width and scalable vectors.

Are you specifically referring to the possibility of something like <vscale x 1 x Ty>? If so, perhaps you're right and we should still generate a whole vector for each part.

david-arm updated this revision to Diff 331857.Mar 19 2021, 6:52 AM
  • Fixed buildScalarSteps so that generate the full vector part for VF=(1,scalable).
david-arm marked an inline comment as done.Mar 19 2021, 6:53 AM
david-arm added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2536

Hi @ctetreau, I've changed the code to now check for:

!IsUniform && VF.isScalable()

so that this is guaranteed to do something sensible for <vscale x 1 x ElTy> vectors as well.

ctetreau added inline comments.Mar 19 2021, 9:47 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2536

Thanks.

Unfortunately, I'm quite busy with internal stuff right now, so I probably won't have time to review this closely. If things calm down and this is still up, I'll try to take a closer look. However, if you get LGTM from somebody else feel free to just merge it.

kmclaughlin accepted this revision.Mar 26 2021, 11:41 AM

I think this looks good, I just have a couple of minor comments

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
2254

buildScalarSteps seems to use CreateSIToFP, should we also be using this here?

llvm/test/Transforms/LoopVectorize/AArch64/sve-inductions.ll
113

nit: I think this could be checking for %[[VSCALE]] instead of %6?

This revision is now accepted and ready to land.Mar 26 2021, 11:41 AM