This is an archive of the discontinued LLVM Phabricator instance.

[SVE][CodeGen] Lower scalable masked scatters
AbandonedPublic

Authored by kmclaughlin on Oct 16 2020, 11:25 AM.

Details

Summary

Lowers the llvm.masked.scatter intrinsics (scalar plus vector addressing mode only)

Changes included in this patch:

  • Adds the IsTruncatingStore flag to MaskedScatterSDNode, set by getMaskedScatter()
  • Added refineIndexType() to DAGCombine which folds a sext/zext of the index into the index type. Also added refineUniformBase() which gets the base pointer and index from an add + splat_vector.
  • Custom lowering for MSCATTER, which chooses the appropriate scatter store opcode to use. Floating-point scatters are cast to integer, with patterns added to match FP reinterpret_casts.
  • Print details of MaskedScatterSDNodes (is truncating, signed or scaled) in SelectionDAGDumper
  • Tests with 32 & 64-bit scaled & unscaled offsets

Diff Detail

Event Timeline

kmclaughlin created this revision.Oct 16 2020, 11:25 AM
kmclaughlin requested review of this revision.Oct 16 2020, 11:25 AM

I'm a little suspicious that the DAGCombine changes don't have any effect on x86 regression tests; are we missing test coverage?

I'm a little suspicious that the DAGCombine changes don't have any effect on x86 regression tests; are we missing test coverage?

X86 does have its own sext combine. Does anything prevent refineIndexType from making index types that aren't supported by the hardware instruction?

kmclaughlin added a comment.EditedOct 20 2020, 9:35 AM

X86 does have its own sext combine. Does anything prevent refineIndexType from making index types that aren't supported by the hardware instruction?

Hi @craig.topper, there is nothing to prevent refineIndexType from returning an index type that isn't supported, though if the type of Index is illegal then I think it should be sign/zero extended again later during type legalisation.
Do you think it's worth adding an interface to ask whether the instruction can perform the sign/zero-extend first before removing it in refineIndexType?

X86 does have its own sext combine. Does anything prevent refineIndexType from making index types that aren't supported by the hardware instruction?

Hi @craig.topper, there is nothing to prevent refineIndexType from returning an index type that isn't supported, though if the type of Index is illegal then I think it should be sign/zero extended again later during type legalisation.
Do you think it's worth adding an interface to ask whether the instruction can perform the sign/zero-extend first before removing it in refineIndexType?

I was thinking of vXi8 and vXi16 vectors that can't be used as indices on X86.

I was thinking of vXi8 and vXi16 vectors that can't be used as indices on X86.

If vXi8 and vXi16 are not legal, the type-legaliser will re-add sign/zero extends when doing type-promotion in DAGTypeLegalizer::PromoteIntOp_MSCATTER, for which it uses the signedness of the MemIndexType. The same is also needed for SVE, which only supports nxvXi32 and nxvXi64 indices. But at least this ensures that e.g. a zero-extend of a legal nxv4i32 -> illegal nxv4i64 will get encoded as legal UNSIGNED_[UN]SCALED nxv4i32 for which the architecture will generate the instruction with zero-extending index. Do you think that will be sufficient?

I was thinking of vXi8 and vXi16 vectors that can't be used as indices on X86.

If vXi8 and vXi16 are not legal, the type-legaliser will re-add sign/zero extends when doing type-promotion in DAGTypeLegalizer::PromoteIntOp_MSCATTER, for which it uses the signedness of the MemIndexType. The same is also needed for SVE, which only supports nxvXi32 and nxvXi64 indices. But at least this ensures that e.g. a zero-extend of a legal nxv4i32 -> illegal nxv4i64 will get encoded as legal UNSIGNED_[UN]SCALED nxv4i32 for which the architecture will generate the instruction with zero-extending index. Do you think that will be sufficient?

vXi8/vXi16 are legal types on X86 but not supported as indices to gather.

I was thinking of vXi8 and vXi16 vectors that can't be used as indices on X86.

If vXi8 and vXi16 are not legal, the type-legaliser will re-add sign/zero extends when doing type-promotion in DAGTypeLegalizer::PromoteIntOp_MSCATTER, for which it uses the signedness of the MemIndexType. The same is also needed for SVE, which only supports nxvXi32 and nxvXi64 indices. But at least this ensures that e.g. a zero-extend of a legal nxv4i32 -> illegal nxv4i64 will get encoded as legal UNSIGNED_[UN]SCALED nxv4i32 for which the architecture will generate the instruction with zero-extending index. Do you think that will be sufficient?

vXi8/vXi16 are legal types on X86 but not supported as indices to gather.

Okay, I see what you mean. That suggests it would be useful to add a target interface to query if the type is legal as an index type for mgather/mscatter.

  • Added isLegalMaskedGSIndexType to query if the index type is legal for masked scatters (returning true for nxv2i32, nxv4i32 & nxv2i64 on AArch64)

Just a couple of general observations as I'm not sure if I'll get chance to properly review before the patch is accepted. I'm not expecting any changes based on these comments as they're effectively refactoring and thus getting to a state where MGATHER/MSCATTER works for SVE is a better starting point for such refactoring.

  1. I cannot help but think the conversion from MSCATTER to SVE intrinsic is backwards and will limit us in the future, or at least cause us to move code into lowering that probably doesn't belong.
  2. I want to restrict when and where AArch64ISD::REINTERPRET_CAST is used. You can see from the patterns you've added that some alias to ISD::BITCAST and ultimately allowing every combination will require a lot of patterns. Instead I plan to restrict AArch64ISD::REINTERPRET_CAST to only allow casting between vectors of the same element type.
  3. There's duplication between our enum representing SCALED and the MSNode's Scale operand. Personally I don't know if there's any call for an arbitrary Scale and it causes unnecessary (for AArch64 at least) complication.

Again, I'm not expecting any changes from you but just wanted to let you know my thinking for the future.

sdesmalen added inline comments.Oct 29 2020, 7:11 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9401–9402

This condition isn't correct, I think you want to always set the index type to unsigned when the index is zero-extended. Only the setting of Index to be the source operand of the zero-extend needs to be guarded by TLI.isLegalMaskedGSIndexType. (same for the sign-extend case below).

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7425–7430

Can you also add a check that the scalable flags match, i.e. that

N->getIndex().getValueType().getVectorElementCount().isScalable() == 
N->getValue().getValueType().getVectorElementCount().isScalable()
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3701

Is that right though? nxv2i32 is not a legal type.

I feel like the interface is actually asking the wrong question. I think the question the DAGCombiner wants to ask is: "should I remove the sign/zero-extend for the gather/scatter index". For SVE, that is true for indices that are extended like:

nxv4i32 -> nxv4i64
nxv8i32 -> nxv8i64
..

because the instruction can do this. It adds little value to add this for:

nxv2i32

because the type itself needs to be promoted to nxv2i64 anyway.

If it asks the right question and you make it work for nxv8i32, then codegen for the following test would much improve:

define void @masked_scatter_nxv8f32_zext(<vscale x 8 x float> %data, float* %base, <vscale x 8 x i32> %indexes, <vscale x 8 x i1> %masks) nounwind #0 {
  %ext = zext <vscale x 8 x i32> %indexes to <vscale x 8 x i64>
  %ptrs = getelementptr float, float* %base, <vscale x 8 x i64> %ext
  call void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float> %data, <vscale x 8 x float*> %ptrs, i32 0, <vscale x 8 x i1> %masks)
  ret void
}
declare void @llvm.masked.scatter.nxv8f32(<vscale x 8 x float>, <vscale x 8 x float*>, i32, <vscale x 8 x i1>)

(for which tests are currently missing in the patch)

kmclaughlin marked 3 inline comments as done.
  • Changed isLegalMaskedGSIndexType to return true if the instruction can perform the sign extend (i.e. if the index is extended from i32 and the number of elements is at least 4)
  • Added getCanonicalIndexType to convert redundant addressing modes (e.g. scaling is redundant when accessing bytes)
  • Added a target-specific DAG combine for mscatter to promote indices smaller than i32
  • Added various tests for type legalisation in sve-masked-scatter-legalise.ll
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3701

Changed isLegalMaskedGSIndexType to instead return true only if the index is extended from i32 & the number of elements is >=4. The test above has been added to sve-masked-scatter-legalise.ll, along with some other tests where the offset vector type is illegal.

This patch got bigger incrementally and is now now really substantial and a bit difficult to review. Are there parts you can maybe post as a separate patch?

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
9387

This seems specific to AArch64 which has 64bit pointers. But I'd think this check is not needed to begin with, because the MaskedGS BasePtr must be a scalar pointer, or a vector of pointers, so the splat value would always be a pointer type (which I assume this code is checking).

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3701

After this change, I think isLegalMaskedGSIndexType is a misnomer, because nxv8i32 is not a legal type for SVE.
How about bool shouldRemoveExtendFromGSIndex ? (defaulting to false)

16152

nit: s/ScaledIndex/IsScaledIndex/
nit: s/SignedIndex/IsSignedIndex/

llvm/test/CodeGen/AArch64/sve-masked-scatter-legalise.ll
1

nit: please use American spelling for the file name, i.e. legalize :)

39

nit: If you pass in %data as a <vscale x 32 x i8> and sign-extend it to <vscale x 32 x i32>, then your patch won't need to depend on D90219.

kmclaughlin abandoned this revision.Nov 6 2020, 7:14 AM
kmclaughlin marked 4 inline comments as done.

Abandoning this revision as it has been split into several smaller patches:

  • Add the truncating store flag: D90939
  • Lower masked scatters: D90941
  • Improve codegen with refineUniformBase & refineIndexType: D90942
  • DAG combine for extending index: D90945

@sdesmalen Thanks for taking another look at this - I have addressed your comments in the patches listed above