This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
ClosedPublic

Authored by MattDevereau on Jul 28 2022, 2:28 AM.

Details

Summary

For gathers which load in 8 and 16 bit data then use that as an index, the index can be extended to 32 bits instead of 64 bits

Diff Detail

Event Timeline

MattDevereau created this revision.Jul 28 2022, 2:28 AM
Herald added a project: Restricted Project. · View Herald Transcript
MattDevereau requested review of this revision.Jul 28 2022, 2:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 2:28 AM

I think the commit message needs to be more clear: my understanding from the IR is that you're not _always_ narrowing a gather's index if it's 64 bits, just if it was extended from < 32 to 64 and hence could trivially have been extended to 32 instead? I think the commit message needs to reflect that.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18296–18297

Could you just have the following and remove one level of nesting?

if (Index.getOpcode() == ISD::ZERO_EXTEND && tryNarrowZExtGatherIndex(N, Index, DAG))
  return true;
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18293

s/.getVectorElementType().getFixedSizeInBits()/.getScalarSizeInBits()/

Matt added a subscriber: Matt.Jul 28 2022, 8:29 AM

Please can you also add the equivalent SIGN_EXTEND case.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18290

I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate function.

You do need to ensure N is treating Index as unsigned before you can shrink the extend. MaskedGatherScatterSDNode has a function to query this.

MattDevereau retitled this revision from [AArch64][SVE] Narrow 64bit gather index to 32bit to [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits.
MattDevereau edited the summary of this revision. (Show Details)

Update commit message
Added unsigned index check
Inlined logic instead of using a function

@MattDevereau: I've recently landed D130533 which I think can benefit you here. Updating isVectorShrinkable so it first handles your explicit extension cases before it drops into the BUILD_VECTOR handling might do what you need.

MattDevereau added a comment.EditedAug 22 2022, 3:45 AM

@paulwalker-arm I've extended isVectorShrinkable to shrink sign extended and zero extended gathers/scatters, however i'm seeing the following change to a pair of gather/scatter tests in sve-fixed-length-masked-gather.ll:

; CHECK-LABEL: masked_gather_32b_scaled_sext_f64:
; CHECK:       // %bb.0:
; CHECK-NEXT:    ptrue p0.d, vl32
; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT:    ld1sw { z1.d }, p0/z, [x1]
; CHECK-NEXT:    fcmeq p1.d, p0/z, z0.d, #0.0
; CHECK-NEXT:    ld1d { z0.d }, p1/z, [x2, z1.d, lsl #3]
; CHECK-NEXT:    st1d { z0.d }, p0, [x0]
; CHECK-NEXT:    ret
  %cvals = load <32 x
; CHECK-LABEL: masked_gather_32b_scaled_sext_f64:
; CHECK:       // %bb.0:
; CHECK-NEXT:    ptrue p0.d, vl32
; CHECK-NEXT:    ptrue p1.s, vl32
; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT:    ld1w { z1.s }, p1/z, [x1]
; CHECK-NEXT:    fcmeq p1.d, p0/z, z0.d, #0.0
; CHECK-NEXT:    sunpklo z0.d, z1.s
; CHECK-NEXT:    ld1d { z0.d }, p1/z, [x2, z0.d, lsl #3]
; CHECK-NEXT:    st1d { z0.d }, p0, [x0]
; CHECK-NEXT:    ret
define void @masked_gather_32b_scaled_sext_f64(<32 x double>* %a, <32 x i32>* %b, double* %base) vscale_range(16,0) #0 {
  %cvals = load <32 x double>, <32 x double>* %a
  %idxs = load <32 x i32>, <32 x i32>* %b
  %ext = sext <32 x i32> %idxs to <32 x i64>
  %ptrs = getelementptr double, double* %base, <32 x i64> %ext
  %mask = fcmp oeq <32 x double> %cvals, zeroinitializer
  %vals = call <32 x double> @llvm.masked.gather.v32f64(<32 x double*> %ptrs, i32 8, <32 x i1> %mask, <32 x double> undef)
  store <32 x double> %vals, <32 x double>* %a
  ret void

The test specifically wants the sign-extend which makes sense, but with my changes in we ignore the sign extend and then unpack it pointlessly. I'm assuming I need to check that the gather predicate doesn't come from a comparison with a wider vector before shrinking the vector?

These two tests currently have regressions:
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll:@masked_gather_32b_scaled_sext_f64
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll:@masked_scatter_32b_scaled_sext_f64

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
297

I'm probably just being paranoid but with this code now sitting after the call to getScalarSizeInBits() can you add something like assert(N->getValueType(0).isVector() && "Expected a vector!"); here? just in case.

304

It's not sufficient to only consider the result type of the extend. For example, take

isVectorShrinkable(zext_to_i64(i48_node, 32, false)

With the current implementation you'll return true but:

zext_to_i64(trunc_to_i32(i48_node)) != zext_to_i64(i48_node)

For the sext & zext cases I think you also need N's operand to be <= NewEltSize

llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
1036–1041 ↗(On Diff #454483)

I've had a look and believe the issue is that for fixed length vectors we don't want to shrink Index when the main datatype of the gather/scatter is a vector of 64bit values because the data elements must line up with the offset elements and when that is not the case operation legalisation will "fix" it by explicitly extending which ever is the smaller type.

This is fixable within findMoreOptimalIndexType by just bailing about of such types just before the // Can indices be trivially shrunk? block.

The reason this is a fixed-length only problem is because for scalable vectors we use an "unpacked" format where each element of a nxv2i32 vector sits within the bottom half of each element of a nxv2i64 vector so element of differing sizes remain aligned to each other.

paulwalker-arm added inline comments.Sep 23 2022, 5:35 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18316

Please pull this out into a separate check before this block, along with a suitable comment. The "fix" is not really related to isVectorShrinkable, it is just that today that is the only logic applicable for fixed length vectors. However, this might change in the future hence why I prefer an isolated check.

MattDevereau added inline comments.Sep 26 2022, 3:37 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18316

I'm not sure if I'm missing something obvious here but I don't understand what you're expecting this to look like when this check is seperated. The way this function's logic is structured means it's difficult to bail out cleanly from individual checks here, as we want to fall through this if block instead of returning false. I could use a nested if block, e.g.

//comment
if (!(DataVT.getScalarSizeInBits() == 64 && DataVT.isFixedLengthVector())){
  if (!ISD::isVectorShrinkable... ){
    ...
  }
}

To sort of separate the "temporary" condition, but I believe it still needs to be and logic for the correct behaviour here
Could you please elaborate on what you're expecting?

paulwalker-arm added inline comments.Sep 26 2022, 4:17 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
18316

I was thinking of:

// Fixed length vectors are always "packed" so there's no value in the index having a smaller element type than the data.
if (DataVT.isFixedLengthVector() && DataVT.getScalarSizeInBits() == 64)
  return Changed;

It is my assertion that we don't want to fall through because all code after this point is trying to rewrite Index to be a vector of i32s, which is never going to be good for fixed length vector types because Index will just get re-extended during operation legalisation.

paulwalker-arm accepted this revision.Sep 26 2022, 4:06 PM
This revision is now accepted and ready to land.Sep 26 2022, 4:06 PM
This revision was landed with ongoing or failed builds.Sep 28 2022, 7:45 AM
This revision was automatically updated to reflect the committed changes.