For gathers which load in 8 and 16 bit data then use that as an index, the index can be extended to 32 bits instead of 64 bits
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I think the commit message needs to be more clear: my understanding from the IR is that you're not _always_ narrowing a gather's index if it's 64 bits, just if it was extended from < 32 to 64 and hence could trivially have been extended to 32 instead? I think the commit message needs to reflect that.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17671–17672 | Could you just have the following and remove one level of nesting? if (Index.getOpcode() == ISD::ZERO_EXTEND && tryNarrowZExtGatherIndex(N, Index, DAG)) return true; |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17655 | s/.getVectorElementType().getFixedSizeInBits()/.getScalarSizeInBits()/ |
Please can you also add the equivalent SIGN_EXTEND case.
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17652 | I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate function. You do need to ensure N is treating Index as unsigned before you can shrink the extend. MaskedGatherScatterSDNode has a function to query this. |
Update commit message
Added unsigned index check
Inlined logic instead of using a function
@MattDevereau: I've recently landed D130533 which I think can benefit you here. Updating isVectorShrinkable so it first handles your explicit extension cases before it drops into the BUILD_VECTOR handling might do what you need.
@paulwalker-arm I've extended isVectorShrinkable to shrink sign extended and zero extended gathers/scatters, however i'm seeing the following change to a pair of gather/scatter tests in sve-fixed-length-masked-gather.ll:
; CHECK-LABEL: masked_gather_32b_scaled_sext_f64: ; CHECK: // %bb.0: ; CHECK-NEXT: ptrue p0.d, vl32 ; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0] ; CHECK-NEXT: ld1sw { z1.d }, p0/z, [x1] ; CHECK-NEXT: fcmeq p1.d, p0/z, z0.d, #0.0 ; CHECK-NEXT: ld1d { z0.d }, p1/z, [x2, z1.d, lsl #3] ; CHECK-NEXT: st1d { z0.d }, p0, [x0] ; CHECK-NEXT: ret %cvals = load <32 x
; CHECK-LABEL: masked_gather_32b_scaled_sext_f64: ; CHECK: // %bb.0: ; CHECK-NEXT: ptrue p0.d, vl32 ; CHECK-NEXT: ptrue p1.s, vl32 ; CHECK-NEXT: ld1d { z0.d }, p0/z, [x0] ; CHECK-NEXT: ld1w { z1.s }, p1/z, [x1] ; CHECK-NEXT: fcmeq p1.d, p0/z, z0.d, #0.0 ; CHECK-NEXT: sunpklo z0.d, z1.s ; CHECK-NEXT: ld1d { z0.d }, p1/z, [x2, z0.d, lsl #3] ; CHECK-NEXT: st1d { z0.d }, p0, [x0] ; CHECK-NEXT: ret
define void @masked_gather_32b_scaled_sext_f64(<32 x double>* %a, <32 x i32>* %b, double* %base) vscale_range(16,0) #0 { %cvals = load <32 x double>, <32 x double>* %a %idxs = load <32 x i32>, <32 x i32>* %b %ext = sext <32 x i32> %idxs to <32 x i64> %ptrs = getelementptr double, double* %base, <32 x i64> %ext %mask = fcmp oeq <32 x double> %cvals, zeroinitializer %vals = call <32 x double> @llvm.masked.gather.v32f64(<32 x double*> %ptrs, i32 8, <32 x i1> %mask, <32 x double> undef) store <32 x double> %vals, <32 x double>* %a ret void
The test specifically wants the sign-extend which makes sense, but with my changes in we ignore the sign extend and then unpack it pointlessly. I'm assuming I need to check that the gather predicate doesn't come from a comparison with a wider vector before shrinking the vector?
These two tests currently have regressions:
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll:@masked_gather_32b_scaled_sext_f64
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll:@masked_scatter_32b_scaled_sext_f64
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
296 ↗ | (On Diff #454483) | I'm probably just being paranoid but with this code now sitting after the call to getScalarSizeInBits() can you add something like assert(N->getValueType(0).isVector() && "Expected a vector!"); here? just in case. |
301 ↗ | (On Diff #454483) | It's not sufficient to only consider the result type of the extend. For example, take isVectorShrinkable(zext_to_i64(i48_node, 32, false) With the current implementation you'll return true but: zext_to_i64(trunc_to_i32(i48_node)) != zext_to_i64(i48_node) For the sext & zext cases I think you also need N's operand to be <= NewEltSize |
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll | ||
1036–1041 ↗ | (On Diff #454483) | I've had a look and believe the issue is that for fixed length vectors we don't want to shrink Index when the main datatype of the gather/scatter is a vector of 64bit values because the data elements must line up with the offset elements and when that is not the case operation legalisation will "fix" it by explicitly extending which ever is the smaller type. This is fixable within findMoreOptimalIndexType by just bailing about of such types just before the // Can indices be trivially shrunk? block. The reason this is a fixed-length only problem is because for scalable vectors we use an "unpacked" format where each element of a nxv2i32 vector sits within the bottom half of each element of a nxv2i64 vector so element of differing sizes remain aligned to each other. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17688 | Please pull this out into a separate check before this block, along with a suitable comment. The "fix" is not really related to isVectorShrinkable, it is just that today that is the only logic applicable for fixed length vectors. However, this might change in the future hence why I prefer an isolated check. |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17688 | I'm not sure if I'm missing something obvious here but I don't understand what you're expecting this to look like when this check is seperated. The way this function's logic is structured means it's difficult to bail out cleanly from individual checks here, as we want to fall through this if block instead of returning false. I could use a nested if block, e.g. //comment if (!(DataVT.getScalarSizeInBits() == 64 && DataVT.isFixedLengthVector())){ if (!ISD::isVectorShrinkable... ){ ... } } To sort of separate the "temporary" condition, but I believe it still needs to be and logic for the correct behaviour here |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
17688 | I was thinking of: // Fixed length vectors are always "packed" so there's no value in the index having a smaller element type than the data. if (DataVT.isFixedLengthVector() && DataVT.getScalarSizeInBits() == 64) return Changed; It is my assertion that we don't want to fall through because all code after this point is trying to rewrite Index to be a vector of i32s, which is never going to be good for fixed length vector types because Index will just get re-extended during operation legalisation. |
I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate function.
You do need to ensure N is treating Index as unsigned before you can shrink the extend. MaskedGatherScatterSDNode has a function to query this.