Page MenuHomePhabricator

[LoopVectorize] Fix scalarisation crash in widenPHIInstruction for scalable vectors

Authored by david-arm on Apr 26 2021, 7:17 AM.



In InnerLoopVectorizer::widenPHIInstruction there are cases where we have
to scalarise a pointer induction variable after vectorisation. For scalable
vectors we already deal with the case where the pointer induction variable
is uniform, but we currently crash if not uniform. For fixed width vectors
we calculate every lane of the scalarised pointer induction variable for a
given VF, however this cannot work for scalable vectors. In this case I
have added support for caching the whole vector value for each unrolled
part so that we can always extract an arbitrary element. Additionally, we
still continue to cache the known minimum number of lanes too in order
to improve code quality by avoiding an extractelement operation.

I have adapted an existing test pointer_iv_mixed from the file:


and added it here for scalable vectors instead:


Diff Detail

Event Timeline

david-arm created this revision.Apr 26 2021, 7:17 AM
david-arm requested review of this revision.Apr 26 2021, 7:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 26 2021, 7:17 AM

I've reviewed what's in-diff in detail, I've run and looked at the code and it makes sense to me in as much as I understand it. However, I have little experience here so I will defer acceptance to someone else for now.


Nit: clang format failure.

sdesmalen added inline comments.Apr 30 2021, 6:52 AM

Can we unify this approach for both scalable and fixed-width vectors? Or phrased differently, is there still value in the per-lane calculation for fixed-width vectors? I would expect InstCombine to simplify any extract-elements from the splat + stepvector, if those are needed individually.

Matt added a subscriber: Matt.May 1 2021, 8:10 AM
david-arm updated this revision to Diff 343012.May 5 2021, 5:29 AM
  • Avoided caching additional lanes when scalarising a PHI for scalable vectors.
david-arm marked 2 inline comments as done.May 5 2021, 5:35 AM
david-arm added inline comments.

Hi @sdesmalen, so I looked into this and there is a problem for fixed-width vectors because the vector GEP could be created in a different block to the actual extractelement operation. I have added an appropriate fold in instcombine anyway - see D101900 - as you're right that this is useful. However, for non-uniform scalarisation with scalable vectors I have changed the code below to avoid caching the per-lane values with the continue; line.

sdesmalen added inline comments.May 10 2021, 3:20 AM

Please add a message that this is not yet handled (although I guess it could be in the future?)


same here.


nit: Can you move this out into a bool named NeedsVectorIndex ?

david-arm updated this revision to Diff 344039.May 10 2021, 6:13 AM
david-arm marked an inline comment as done.
  • Added strings to asserts.
  • Added new variable NeedsVectorIndex
david-arm marked 3 inline comments as done.May 10 2021, 6:13 AM
sdesmalen added inline comments.May 10 2021, 8:10 AM

Is there value in testing with an interleave count > 1 ?

david-arm updated this revision to Diff 344425.May 11 2021, 9:03 AM
  • Changed new test to use an interleave count of 2 to show that we correctly generate code for all Parts, not just Part 0.
david-arm marked an inline comment as done.May 11 2021, 9:03 AM
sdesmalen accepted this revision.May 11 2021, 9:27 AM

LGTM, thanks for updating the test!

This revision is now accepted and ready to land.May 11 2021, 9:27 AM
This revision was landed with ongoing or failed builds.May 12 2021, 3:02 AM
This revision was automatically updated to reflect the committed changes.