This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Lower scalable masked gathers
ClosedPublic

Authored by kmclaughlin on Nov 9 2020, 10:50 AM.

Details

Summary

Lowers the llvm.masked.gather intrinsics (scalar plus vector addressing mode only)

Changes in this patch:

  • Add custom lowering for MGATHER, using getGatherVecOpcode() to choose the appropriate gather load opcode to use.
  • Improve codegen with refineIndexType/refineUniformBase, added in D90942
  • Tests added for gather loads with 32 & 64-bit scaled & unscaled offsets.

Diff Detail

Event Timeline

kmclaughlin created this revision.Nov 9 2020, 10:50 AM
Herald added a project: Restricted Project. · View Herald Transcript
kmclaughlin requested review of this revision.Nov 9 2020, 10:50 AM
  • Rebased after updating the parent patch, D90942
  • Moved SplitVecRes_MGATHER/SplitVecOp_MGATHER changes to D91084

@craig.topper After changing the refineUniformBase function in D90942 to use getSplatVector() when trying to get the base pointer from a splat, the output of one of the X86 masked gather tests in masked_gather_scatter.ll has changed ("test14", where the base pointer is not a splat, so getUniformBase returns false).
Could you please take a look over this to make sure the changes are correct in this circumstance?

@craig.topper After changing the refineUniformBase function in D90942 to use getSplatVector() when trying to get the base pointer from a splat, the output of one of the X86 masked gather tests in masked_gather_scatter.ll has changed ("test14", where the base pointer is not a splat, so getUniformBase returns false).
Could you please take a look over this to make sure the changes are correct in this circumstance?

I think its fine. That code wouldn't have made it past InstCombine in that form. The insertelement would have been deleted by SimplifyDemandedElts.

kmclaughlin edited the summary of this revision. (Show Details)
  • Added the getExtendedGatherOpcode() function, which returns a signed gather load opcode (e.g GLD1_MERGE_ZERO -> GLD1S_MERGE_ZERO)
  • Get the extension type of the gather load in LowerMGATHER and use the signed gather opcode returned by getExtendedGatherOpcode() if the extension type is EXTLOAD/SEXTLOAD,

Reordering the scalable masked gather patch series that this is a part of, as suggested on D91084. This is now the first patch in the series.

For this patch the changes involved removing references to ExtensionType from LowerMGATHER (and also removing the getExtendedGatherOpcode() function)

  • Moved the addition of the -aarch64-enable-mgather-combine flag from D92230 into this patch & updated the RUN lines of tests included here to use this option (set to 0)
  • Moved changes to DAGCombiner.cpp into a new patch, D92319
  • I mistakenly created this patch to depend on D92319, this revision removes this dependency
sdesmalen accepted this revision.Nov 30 2020, 8:45 AM

Thanks for splitting this work up into smaller patches. This patch in the series looks good to me now!

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
116

nit: can you add a comment here to describe that this option is temporary and only exists for the purpose of testing functionality added to DAGCombiner.cpp. Perhaps you can also say there is the expectation to remove it in the future, when both implementations of will be based off MGATHER. rather than relying on the ISD nodes we added for the llvm.aarch64.sve.ld1.gather intrinsics.

This revision is now accepted and ready to land.Nov 30 2020, 8:45 AM
sdesmalen requested changes to this revision.Nov 30 2020, 9:05 AM
sdesmalen added inline comments.
llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
11

Sorry, just spotting this now while reviewing D92319. This instruction should not use the scaled addressing mode.

That seems to be because SelectionDAGBuilder::visitMaskedGather defaults to scaled. In D90941 you've changed it to unscaled for MSCATTER. You'll need to do the same for MGATHER, because this leads to wrong code being generated..

This revision now requires changes to proceed.Nov 30 2020, 9:05 AM
kmclaughlin marked an inline comment as done.
  • Changed the default IndexType set by visitMaskedGather to ISD::SIGNED_UNSCALED
sdesmalen added inline comments.Dec 2 2020, 6:04 AM
llvm/test/CodeGen/AArch64/sve-masked-gather-32b-unsigned-scaled.ll
12

Another thing I spotted, this should use uxtw, or use sxtw but then still keep the original zero-extend (and).
So something in this patch is a bit too eager in removing the extend.

  • Removed Index = Index.getOperand(0) from LowerMGATHER, which was incorrectly removing the sign/zero extend of Index if getGatherScatterIndexIsExtended returns true
kmclaughlin marked an inline comment as done.Dec 2 2020, 9:22 AM
kmclaughlin added inline comments.
llvm/test/CodeGen/AArch64/sve-masked-gather-32b-unsigned-scaled.ll
12

Good spot! I've removed the following lines from LowerMGATHER which was removing the extends:

if (getGatherScatterIndexIsExtended(Index))
    Index = Index.getOperand(0);

I will move this to D92319, where it should be safe to remove the extend after refineIndexType has already folded it into the index type

sdesmalen accepted this revision.Dec 3 2020, 2:40 AM

LGTM, thanks for the changes!

This revision is now accepted and ready to land.Dec 3 2020, 2:40 AM
This revision was automatically updated to reflect the committed changes.
kmclaughlin marked an inline comment as done.
yubing added inline comments.Dec 18 2020, 5:42 AM
llvm/lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp
1765

Do we really need to split the MemoryType here? I observed that in WidenVecRes_MGATHER, we don't widen the MemoryType.
If we have a v17i32's masked_gather in avx512, we widen it to a v32i32's masked_gather with a v17i32's MemoryType. When the SplitVecRes_MGATHER process this v32i32's masked_gather, line1765 will assert fail since what you are going to split is v17i32.

Is that not a bug in WidenVecRes_MGATHER? I would have thought there's an expectation that the element count of MGATHER would match the element count of it's MemVT. That's to say that these nodes allow loads with element extension rather than loads where the vector is packed into a larger container.