This is an archive of the discontinued LLVM Phabricator instance.

[LV] Fix incorrectly marking a pointer indvar as 'scalar'.
ClosedPublic

Authored by sdesmalen on Nov 22 2021, 8:35 AM.

Details

Summary

collectLoopScalars should only add non-uniform nodes to the list if they
are used by a load/store instruction that is marked as CM_Scalarize.

Before this patch, the LV incorrectly marked pointer induction variables
as 'scalar' when they required to be widened by something else,
such as a compare instruction, and weren't used by a node marked as
'CM_Scalarize'. This case is covered by sve-widen-phi.ll.

This change also allows removing some code where the LV tried to
widen the PHI nodes with a stepvector, even though it was marked as
'scalarAfterVectorization'. Now that this code is more careful about
marking instructions that need widening as 'scalar', this code has
become redundant.

Diff Detail

Event Timeline

sdesmalen created this revision.Nov 22 2021, 8:35 AM
sdesmalen requested review of this revision.Nov 22 2021, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 22 2021, 8:35 AM

Thanks for the fix @sdesmalen! I've not quite finished reviewing all the test changes, but I have a few comments so far ... :)

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4866–4867

If we don't expect to widen here anymore (i.e. you removed the IsUniform && VF.isScalable() stuff), can we also just assert(IsUniform) and only generate the first lane? If not, I think it's worth having an assert that VF is not scalable here because we're only caching the first known minimum number of lanes here.

5209–5210

I think the comment needs updating here to remove the bits about inductions.

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
51

I'm actually surprised these aren't folded away when creating the mul in IRBuilder, but that's not a problem with your patch!

64

I think this may potentially lead to poorer code quality here with the lane extract, although that may be a price worth paying for overall correctness. I'm not certain if the extract will get folded with the vector GEP and transformed into a scalar GEP here. Maybe you could run -instcombine on this code to double check the output?

169

Do we need all the check lines after this point, since we only really care about the vector loop?

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
137

Again, it's worth checking whether ot not these extracts get folded away?

Thanks for the comments @david-arm!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4866–4867

If we don't expect to widen here anymore (i.e. you removed the IsUniform && VF.isScalable() stuff), can we also just assert(IsUniform) and only generate the first lane?

I don't think so, because for fixed-width vectors this is still a valid code-path to go down. i.e. when it has chosen to scalarize the pointer-induction value used in a store (set to CM_Scalarize), which means it's ScalarAfterVectorization, but not Uniform.

If not, I think it's worth having an assert that VF is not scalable here because we're only caching the first known minimum number of lanes here.

For scalable vectors this code-path is still sensible, but only if the value is Uniform, so I can add assert(IsUniform || !VF.isScalable()).

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll
64

It doesn't get folded away by InstCombine because [[TMP13]] has more than one use (see D101900). I'm not sure if this is really a problem in practice though, because it's only concerning lane 0, which is cheap to extract. I found that the resulting assembly when removing the 'hasOneUse' restriction was the same.

169

Possibly not, but because this file had:

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py

at the top, I figured to just update the check lines.

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
137

The reason the extract is not folded away is because InstCombine::visitExtractElementInst doesn't look through bitcasts, because bitcasts may change the number of elements. Perhaps we can change that if the number of elements is unchanged, but then it will run into the hasOneUse restriction.

sdesmalen updated this revision to Diff 389791.Nov 25 2021, 7:39 AM

Added assert in widenPHIInstruction.

david-arm accepted this revision.Nov 26 2021, 3:29 AM

LGTM!

llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll
208

nit: Before merging the patch could you rename this block to something like exit or for.end? It just looks a bit weird having a preheader as the exit. :)

This revision is now accepted and ready to land.Nov 26 2021, 3:29 AM
fhahn added inline comments.Nov 26 2021, 3:34 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5209–5210

The comment still mentioned pointer inductions. This is out-of-date now, right?

sdesmalen updated this revision to Diff 389996.Nov 26 2021, 4:13 AM

Updated comment.

sdesmalen updated this revision to Diff 389998.Nov 26 2021, 4:15 AM

Also updated the label in the test.

sdesmalen marked 8 inline comments as done.Nov 26 2021, 4:16 AM
sdesmalen added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5209–5210

Thanks for pointing out, I missed that earlier!

This revision was landed with ongoing or failed builds.Nov 28 2021, 1:50 AM
This revision was automatically updated to reflect the committed changes.
sdesmalen marked an inline comment as done.