This is an archive of the discontinued LLVM Phabricator instance.

[LoopVectorize] Extract the last lane from a uniform store
ClosedPublic

Authored by kmclaughlin on Oct 28 2021, 8:27 AM.

Details

Summary

Changes VPReplicateRecipe to extract the last lane from an unconditional,
uniform store instruction. collectLoopUniforms will also add stores to
the list of uniform instructions where Legal->isUniformMemOp is true.

setCostBasedWideningDecision now sets the widening decision for
all uniform memory ops to Scalarize, where previously GatherScatter
may have been chosen for scalable stores.

This fixes an assert ("Cannot yet scalarize uniform stores") in
setCostBasedWideningDecision when we have a loop containing a
uniform i1 store and a scalable VF, which we cannot create a scatter for.

Diff Detail

Event Timeline

kmclaughlin created this revision.Oct 28 2021, 8:27 AM
kmclaughlin requested review of this revision.Oct 28 2021, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 28 2021, 8:27 AM
sdesmalen added inline comments.Oct 29 2021, 12:53 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
7479–7480

This condition is always true, because it is enclosed by if (Legal->isUniformMemOp(I))

9821

This code needs a comment with rationale for extracting the last lane.

9822

While it is a concrete problem for scalable vectors, I don't think this is necessarily specific to scalable vectors and so we may want to do the same thing for fixed-width vectors. I'd expect other passes to remove the redundant scalar stores that are currently created, but it would be nice if those would not be generated in the first place.

9822

I would expect isScalarAfterVectorization to be set to true when the memory address is uniform and that to be used instead of '!IsUniform && isUniformMemOp(*I). Can you check whether this can be used instead?

kmclaughlin marked 3 inline comments as done.
  • Removed redundant Legal->isUniformMemOp(I) check from setCostBasedWideningDecision
  • Added a comment to VPReplicateRecipe::execute
  • Removed the State.VF.isScalable() check from VPReplicateRecipe::execute & updated the tests affected by this change. Also added a test of uniform stores for fixed-width.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9822

Hi @sdesmalen, I've removed State.VF.isScalable() and added a new test for fixed-width to LoopVectorize/uniform-store.ll. There were also a few existing tests with uniform stores affected by this change which have been updated.

9822

I tried removing !IsUniform && isUniformMemOp(*I) and replacing it with isScalarAfterVectorization, though this returns false for the instruction here and we continue on to hit the "Can't scalarize a scalable vector" assert below.

Matt added a subscriber: Matt.Nov 1 2021, 2:32 PM
kmclaughlin edited the summary of this revision. (Show Details)
  • Removed the uniform-store.ll test added in the previous revision.
kmclaughlin marked an inline comment as not done.Nov 2 2021, 11:25 AM
kmclaughlin added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9822

@sdesmalen I think you're right that isScalarAfterVectorization should be true for the store instruction and I've created D113034 (which changes collectLoopUniforms to also consider uniform stores) to address this.
I think we still need to check isUniformMemOp here though, since Scalars collects more than just uniform instructions and we only want to generate the last lane for uniform stores.

fhahn added a comment.Nov 3 2021, 1:33 AM

I think the title needs updating after the latest update (remove SVE).

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9822

If D113034 would be landed first, shouldn't IsUniform be set correctly? Ideally the uniform information should be explicit in the recipe and Legal should not be accessed during codegen.

kmclaughlin retitled this revision from [SVE][LoopVectorize] Extract the last lane from a uniform store to [LoopVectorize] Extract the last lane from a uniform store.
kmclaughlin edited the summary of this revision. (Show Details)
  • Merged with D113034, which makes changes to collectLoopUniforms to collect uniform store instructions.
kmclaughlin added inline comments.Nov 3 2021, 9:59 AM
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9822

Hi @fhahn, I don't think D113034 could be landed first since it requires changes from this patch to generate the last lane for stores? Since these patches are closely related I've merged them here so that I don't need to access Legal from VPReplicateRecipe.

The changes look good to me @kmclaughlin! I'll look through the remaining non-aarch64 tests later today. I had a couple of minor comments so far ...

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
9824

This change looks sensible to me, since I think we mark unrolled scalar stores as uniform.

llvm/test/Transforms/LoopVectorize/AArch64/sve-inv-store.ll
29

Perhaps it's worth deleting the CHECK lines from middle.block onwards as they don't add much value?

llvm/test/Transforms/LoopVectorize/AArch64/sve-uniform-store.ll
5

I wonder if it's perhaps worth moving these tests into sve-inv-store.ll, since they're testing the same thing?

david-arm added inline comments.Nov 4 2021, 4:40 AM
llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
85

Hi @kmclaughlin, I don't think this looks right sadly. I wonder if now we're marking the store as uniform that you've exposed an existing bug in collectLoopUniforms or somewhere else like handleReplication? It looks like we're also now treating the the load as uniform, which is wrong in this case because we still want to do the vector load and store out the last lane. I'd expected something like:

[[WIDE_LOAD:%.*]] = load <4 x i32>, <4 x i32>* [[TMP22]], align 4
[[TMP23:%.*]] = add nsw <4 x i32> [[WIDE_LOAD]], <i32 1, i32 1, i32 1, i32 1>
[[TMP27:%.*]] = extractelement <4 x i32> [[TMP23]], i32 3
store i32 [[TMP27]], i32* [[ARRAYIDX7_US]], align 4
fhahn requested changes to this revision.Nov 4 2021, 5:17 AM
fhahn added inline comments.
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5480

Unfortunately this is incorrect now. The worklist currently relies on instructions only demanding the first lane, and this is used to propagate this property later on, using: if all users of an operand demand the first lane only, the operand itself also only needs to compute the first lane. I added a clarifying comment in b4992dbb21ff9159285ae0aec73f3d760344b0e5

Adding stores violates that. You should probably be able to work around that by adding them to Uniforms without adding them to the worklist. It might be worth calling out that entries in Uniforms may demand the first or last lane.

This revision now requires changes to proceed.Nov 4 2021, 5:17 AM
fhahn added a reviewer: Ayal.Nov 4 2021, 5:18 AM
kmclaughlin marked 2 inline comments as done.
  • Add store instructions to the Uniforms list in collectLoopUniforms, instead of the worklist. Added more comments to clarify that instructions in Uniforms may demand the first or last lane.
  • Moved the new tests in sve-uniform-store.ll into sve-inv-store.ll.
  • Removed the CHECK lines from middle.block from @inv_store_i16
llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5480

Thank you for adding a comment to addToWorklistIfAllowed, @fhahn. I've changed this so that store instructions are only added to Uniforms as suggested and added some comments which I hope makes this clear.

llvm/test/Transforms/LoopVectorize/X86/illegal-parallel-loop-uniform-write.ll
85

Thanks @david-arm, the load instruction was incorrectly being marked as uniform here. I think now that I've changed collectLoopUniforms so that stores are not added to the worklist, the output of this test looks as expected?

david-arm accepted this revision.Nov 5 2021, 7:07 AM

LGTM! It looks like you've addressed all the review comments here. Thanks @kmclaughlin!

fhahn accepted this revision.Nov 7 2021, 1:43 AM

LGTM, thanks!

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5483

nit: perhaps throw in an only (e.g. we instead add these to Uniforms only). Otherwise it may sound like the other instructions won't get added to Uniforms, which they will eventually.

This revision is now accepted and ready to land.Nov 7 2021, 1:43 AM
sdesmalen accepted this revision.Nov 8 2021, 12:06 AM

Nice improvement @kmclaughlin, thanks for addressing my comments.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
5481–5482

nit: Should the comment on line 5443 be updated?

This revision was landed with ongoing or failed builds.Nov 9 2021, 6:44 AM
This revision was automatically updated to reflect the committed changes.
kmclaughlin marked 2 inline comments as done.