This is an archive of the discontinued LLVM Phabricator instance.

[VPlan] Avoid scalarization for scalable vectors.
ClosedPublic

Authored by malharJ on Mar 11 2022, 2:00 AM.

Details

Summary

This patch ensures scalars (except for uniforms) are no
longer collected (prior to LVP planning phase) for
scalable vectorization.

This is to avoid the chances of generating scalarized
instructions later (during LVP execute phase) as they
are not supported for scalable vectorization.

Relevant test has also been added.

Diff Detail

Event Timeline

malharJ created this revision.Mar 11 2022, 2:00 AM
Herald added a project: Restricted Project. · View Herald Transcript
malharJ requested review of this revision.Mar 11 2022, 2:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 11 2022, 2:00 AM

Hi @malharJ, thanks for this fix! I believe this is doing the right thing, because for scalable vectors an instruction should either be considered uniform after vectorization, or it shouldn't be considered scalar.

Can you remove any references to SVE and instead replace it with scalable vectors in the commit message and title. This patch isn't specific to AArch64 SVE, but rather scalable vectors in general.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
4639–4641

nit: s/during planning [...] for SVE./for scalable vectors./

malharJ updated this revision to Diff 414630.Mar 11 2022, 3:42 AM
  • Updated terminology to use scalable vectors instead of SVE
malharJ marked an inline comment as done.Mar 11 2022, 3:42 AM
malharJ retitled this revision from [SVE][VPlan] Avoid collecting scalars for SVE to [VPlan] Avoid collecting scalars for SVE.
malharJ edited the summary of this revision. (Show Details)
malharJ edited the summary of this revision. (Show Details)
malharJ retitled this revision from [VPlan] Avoid collecting scalars for SVE to [VPlan] Avoid scalarization for scalable vectors..Mar 11 2022, 3:44 AM
fhahn added inline comments.Mar 11 2022, 3:45 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6

Does this fail without the change or does this need 2>&1?

20

could use some better names

24

Is the bit cast needed or could this just load i64? It would also be good to add make sure the load is not dead in the loop.

sdesmalen added inline comments.Mar 11 2022, 3:51 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6

I'd actually prefer this test to check the output of LoopVectorize so that we can make sure the output is as expected, as opposed to checking for the absence of a failure.
@malharJ You can probably use the update_test_checks script to generate CHECK lines for the output.

malharJ added inline comments.Mar 11 2022, 3:56 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6

this test will fail without this change .. I dont think 2>&1 is needed.

24

the bitcast is actually needed.
It results in the gep not being classified as 'uniform' which is required for this test case to generate the assert failure (without this patch).

(the way the collectLoopUniforms() logic works is that it looks at the load/store
and places it's pointer operand (which is the bitcast here) into a worklist.
It then iterates over the bitcast's operands (which in this case is the gep) and checks for uniformity. And since the gep has one usage outside the loop, it is marked as not uniform. This consequently results in the loop induction update variable being marked as not uniform, but as a scalar (REPLICATE recipe) causing the assertion failure)

And regarding the dead load, I was just trying to create a minimal testcase.

malharJ updated this revision to Diff 415257.Mar 14 2022, 3:55 PM
  • Updated LIT test to use autogenerated CHECKs using update_test_checks.py
malharJ marked an inline comment as done.Mar 15 2022, 4:52 AM
sdesmalen added inline comments.Mar 15 2022, 5:36 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
8

nit:

s/it (and consequently [...] variable) from being classified as 'uniform'./
  the gep and consequently the loop induction update variable from being classified as 'uniform'/
42

It looks like this loop has been vectorized with an Interleave Factor of 2. Can you limit that using -force-vector-interleave=1, to reduce the number of CHECK lines?

100–102

nit: this alloca and the subsequent load/store seem unnecessary. Maybe you can just pass in some i32 %N as function argument for the number of iterations.

107

Can you make a loop that increments instead of decrements? That avoids the calls to @llvm.experimental.vector.reverse.nxv2f64 and makes the CHECK lines a bit simpler.

malharJ updated this revision to Diff 415812.Mar 16 2022, 6:49 AM
malharJ marked 4 inline comments as done.
  • addressed review comments to reduce size of test output
sdesmalen accepted this revision.Mar 16 2022, 6:53 AM

Thanks for the fix @malharJ, I'm happy with the patch now.

This revision is now accepted and ready to land.Mar 16 2022, 6:53 AM
fhahn added inline comments.Mar 16 2022, 6:55 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
6

this test will fail without this change .. I dont think 2>&1 is needed.

Oh right, it originally failed due to the crash, not the mismatch. The current checks should be sufficient now.

81

Might be good to clean up the basic block names here a bit.

This revision was landed with ongoing or failed builds.Mar 16 2022, 9:34 AM
This revision was automatically updated to reflect the committed changes.
malharJ marked 2 inline comments as done.
malharJ added inline comments.Mar 16 2022, 9:34 AM
llvm/test/Transforms/LoopVectorize/AArch64/scalable-avoid-scalarization.ll
20

Done as part of final commit.

81

Done as part of final commit.