Download Raw Diff

Details

Reviewers

peterwaller-arm
c-rhodes
DavidTruby
dtemirbulatov
paulwalker-arm
efriedma

Commits

rG0a4771a7e845: [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits

Summary

For gathers which load in 8 and 16 bit data then use that as an index, the index can be extended to 32 bits instead of 64 bits

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

MattDevereau created this revision.Jul 28 2022, 2:28 AM

Herald added a reviewer: efriedma. · View Herald TranscriptJul 28 2022, 2:28 AM

Herald added a project: Restricted Project. · View Herald Transcript

Herald added subscribers: psnobl, arphaman, hiraditya and 2 others. · View Herald Transcript

MattDevereau requested review of this revision.Jul 28 2022, 2:28 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 28 2022, 2:28 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Harbormaster completed remote builds in B178032: Diff 448277.Jul 28 2022, 3:14 AM

I think the commit message needs to be more clear: my understanding from the IR is that you're not _always_ narrowing a gather's index if it's 64 bits, just if it was extended from < 32 to 64 and hence could trivially have been extended to 32 instead? I think the commit message needs to reflect that.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17858–17859	Could you just have the following and remove one level of nesting? if (Index.getOpcode() == ISD::ZERO_EXTEND && tryNarrowZExtGatherIndex(N, Index, DAG)) return true;

peterwaller-arm added inline comments.Jul 28 2022, 6:07 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17855	s/.getVectorElementType().getFixedSizeInBits()/.getScalarSizeInBits()/

georges added a subscriber: georges.Jul 28 2022, 6:21 AM

Matt added a subscriber: Matt.Jul 28 2022, 8:29 AM

Please can you also add the equivalent SIGN_EXTEND case.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17852	I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate function. You do need to ensure `N` is treating `Index` as unsigned before you can shrink the extend. MaskedGatherScatterSDNode has a function to query this.

Update commit message
Added unsigned index check
Inlined logic instead of using a function

Harbormaster completed remote builds in B178745: Diff 449255.Aug 2 2022, 6:23 AM

@MattDevereau: I've recently landed D130533 which I think can benefit you here. Updating isVectorShrinkable so it first handles your explicit extension cases before it drops into the BUILD_VECTOR handling might do what you need.

@paulwalker-arm I've extended isVectorShrinkable to shrink sign extended and zero extended gathers/scatters, however i'm seeing the following change to a pair of gather/scatter tests in sve-fixed-length-masked-gather.ll:

; CHECK-LABEL: masked_gather_32b_scaled_sext_f64:
; CHECK:       // %bb.0:
; CHECK-NEXT:    ptrue p0.d, vl32
; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT:    ld1sw { z1.d }, p0/z, [x1]
; CHECK-NEXT:    fcmeq p1.d, p0/z, z0.d, #0.0
; CHECK-NEXT:    ld1d { z0.d }, p1/z, [x2, z1.d, lsl #3]
; CHECK-NEXT:    st1d { z0.d }, p0, [x0]
; CHECK-NEXT:    ret
  %cvals = load <32 x

; CHECK-LABEL: masked_gather_32b_scaled_sext_f64:
; CHECK:       // %bb.0:
; CHECK-NEXT:    ptrue p0.d, vl32
; CHECK-NEXT:    ptrue p1.s, vl32
; CHECK-NEXT:    ld1d { z0.d }, p0/z, [x0]
; CHECK-NEXT:    ld1w { z1.s }, p1/z, [x1]
; CHECK-NEXT:    fcmeq p1.d, p0/z, z0.d, #0.0
; CHECK-NEXT:    sunpklo z0.d, z1.s
; CHECK-NEXT:    ld1d { z0.d }, p1/z, [x2, z0.d, lsl #3]
; CHECK-NEXT:    st1d { z0.d }, p0, [x0]
; CHECK-NEXT:    ret

define void @masked_gather_32b_scaled_sext_f64(<32 x double>* %a, <32 x i32>* %b, double* %base) vscale_range(16,0) #0 {
  %cvals = load <32 x double>, <32 x double>* %a
  %idxs = load <32 x i32>, <32 x i32>* %b
  %ext = sext <32 x i32> %idxs to <32 x i64>
  %ptrs = getelementptr double, double* %base, <32 x i64> %ext
  %mask = fcmp oeq <32 x double> %cvals, zeroinitializer
  %vals = call <32 x double> @llvm.masked.gather.v32f64(<32 x double*> %ptrs, i32 8, <32 x i1> %mask, <32 x double> undef)
  store <32 x double> %vals, <32 x double>* %a
  ret void

The test specifically wants the sign-extend which makes sense, but with my changes in we ignore the sign extend and then unpack it pointlessly. I'm assuming I need to check that the gather predicate doesn't come from a comparison with a wider vector before shrinking the vector?

These two tests currently have regressions:
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll:@masked_gather_32b_scaled_sext_f64
b/llvm/test/CodeGen/AArch64/sve-fixed-length-masked-scatter.ll:@masked_scatter_32b_scaled_sext_f64

Harbormaster completed remote builds in B182582: Diff 454483.Aug 22 2022, 8:26 AM

paulwalker-arm added inline comments.Aug 31 2022, 10:00 AM

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
296	I'm probably just being paranoid but with this code now sitting after the call to `getScalarSizeInBits()` can you add something like `assert(N->getValueType(0).isVector() && "Expected a vector!");` here? just in case.
303	It's not sufficient to only consider the result type of the extend. For example, take isVectorShrinkable(zext_to_i64(i48_node, 32, false) With the current implementation you'll return true but: zext_to_i64(trunc_to_i32(i48_node)) != zext_to_i64(i48_node) For the sext & zext cases I think you also need N's operand to be `<= NewEltSize`
llvm/test/CodeGen/AArch64/sve-fixed-length-masked-gather.ll
1036–1041 ↗	(On Diff #454483)	I've had a look and believe the issue is that for fixed length vectors we don't want to shrink `Index` when the main datatype of the gather/scatter is a vector of 64bit values because the data elements must line up with the offset elements and when that is not the case operation legalisation will "fix" it by explicitly extending which ever is the smaller type. This is fixable within `findMoreOptimalIndexType` by just bailing about of such types just before the `// Can indices be trivially shrunk?` block. The reason this is a fixed-length only problem is because for scalable vectors we use an "unpacked" format where each element of a `nxv2i32` vector sits within the bottom half of each element of a `nxv2i64` vector so element of differing sizes remain aligned to each other.

MattDevereau updated this revision to Diff 457587.Sep 2 2022, 7:09 AM

Harbormaster completed remote builds in B184817: Diff 457587.Sep 2 2022, 8:26 AM

paulwalker-arm added inline comments.Sep 23 2022, 5:35 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17873	Please pull this out into a separate check before this block, along with a suitable comment. The "fix" is not really related to `isVectorShrinkable`, it is just that today that is the only logic applicable for fixed length vectors. However, this might change in the future hence why I prefer an isolated check.

MattDevereau added inline comments.Sep 26 2022, 3:37 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17873	I'm not sure if I'm missing something obvious here but I don't understand what you're expecting this to look like when this check is seperated. The way this function's logic is structured means it's difficult to bail out cleanly from individual checks here, as we want to fall through this if block instead of returning false. I could use a nested if block, e.g. //comment if (!(DataVT.getScalarSizeInBits() == 64 && DataVT.isFixedLengthVector())){ if (!ISD::isVectorShrinkable... ){ ... } } To sort of separate the "temporary" condition, but I believe it still needs to be `and` logic for the correct behaviour here Could you please elaborate on what you're expecting?

paulwalker-arm added inline comments.Sep 26 2022, 4:17 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
17873	I was thinking of: // Fixed length vectors are always "packed" so there's no value in the index having a smaller element type than the data. if (DataVT.isFixedLengthVector() && DataVT.getScalarSizeInBits() == 64) return Changed; It is my assertion that we don't want to fall through because all code after this point is trying to rewrite `Index` to be a vector of i32s, which is never going to be good for fixed length vector types because `Index` will just get re-extended during operation legalisation.

MattDevereau updated this revision to Diff 462865.Sep 26 2022, 4:37 AM

Harbormaster completed remote builds in B188674: Diff 462865.Sep 26 2022, 5:22 AM

paulwalker-arm accepted this revision.Sep 26 2022, 4:06 PM

This revision is now accepted and ready to land.Sep 26 2022, 4:06 PM

This revision was landed with ongoing or failed builds.Sep 28 2022, 7:45 AM

Closed by commit rG0a4771a7e845: [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits (authored by MattDevereau). · Explain Why

This revision was automatically updated to reflect the committed changes.

MattDevereau added a commit: rG0a4771a7e845: [AArch64][SVE] Expand gather index to 32 bits instead of 64 bits.

Diff 457587

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 287 Lines • ▼ Show 20 Lines	for (const SDValue &Op : N->op_values()) {
if (!isa<ConstantFPSDNode>(Op))		if (!isa<ConstantFPSDNode>(Op))
return false;		return false;
}		}
return true;		return true;
}		}

bool ISD::isVectorShrinkable(const SDNode *N, unsigned NewEltSize,		bool ISD::isVectorShrinkable(const SDNode *N, unsigned NewEltSize,
bool Signed) {		bool Signed) {
if (N->getOpcode() != ISD::BUILD_VECTOR)		assert(N->getValueType(0).isVector() && "Expected a vector!");
paulwalker-armUnsubmitted Not Done Reply Inline Actions I'm probably just being paranoid but with this code now sitting after the call to `getScalarSizeInBits()` can you add something like `assert(N->getValueType(0).isVector() && "Expected a vector!");` here? just in case. paulwalker-arm: I'm probably just being paranoid but with this code now sitting after the call to…
return false;

unsigned EltSize = N->getValueType(0).getScalarSizeInBits();		unsigned EltSize = N->getValueType(0).getScalarSizeInBits();
if (EltSize <= NewEltSize)		if (EltSize <= NewEltSize)
return false;		return false;

		if (N->getOpcode() == ISD::ZERO_EXTEND) {
		return (N->getOperand(0).getValueType().getScalarSizeInBits() <=
		paulwalker-armUnsubmitted Not Done Reply Inline Actions It's not sufficient to only consider the result type of the extend. For example, take isVectorShrinkable(zext_to_i64(i48_node, 32, false) With the current implementation you'll return true but: zext_to_i64(trunc_to_i32(i48_node)) != zext_to_i64(i48_node) For the sext & zext cases I think you also need N's operand to be `<= NewEltSize` paulwalker-arm: It's not sufficient to only consider the result type of the extend. For example, take ```…
		NewEltSize) &&
		!Signed;
		}
		if (N->getOpcode() == ISD::SIGN_EXTEND) {
		return (N->getOperand(0).getValueType().getScalarSizeInBits() <=
		NewEltSize) &&
		Signed;
		}
		if (N->getOpcode() != ISD::BUILD_VECTOR)
		return false;

for (const SDValue &Op : N->op_values()) {		for (const SDValue &Op : N->op_values()) {
if (Op.isUndef())		if (Op.isUndef())
continue;		continue;
if (!isa<ConstantSDNode>(Op))		if (!isa<ConstantSDNode>(Op))
return false;		return false;

APInt C = cast<ConstantSDNode>(Op)->getAPIntValue().trunc(EltSize);		APInt C = cast<ConstantSDNode>(Op)->getAPIntValue().trunc(EltSize);
if (Signed && C.trunc(NewEltSize).sext(EltSize) != C)		if (Signed && C.trunc(NewEltSize).sext(EltSize) != C)
▲ Show 20 Lines • Show All 11,669 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 17,843 Lines • ▼ Show 20 Lines	if (auto Shift = DAG.getSplatValue(ShiftOp))
Add.getOperand(0), ShiftOp);		Add.getOperand(0), ShiftOp);
return true;		return true;
}		}
}		}

return false;		return false;
}		}

// Analyse the specified address returning true if a more optimal addressing		// Analyse the specified address returning true if a more optimal addressing
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate function. You do need to ensure `N` is treating `Index` as unsigned before you can shrink the extend. MaskedGatherScatterSDNode has a function to query this. paulwalker-arm: I'll reserve judgement but it doesn't really seem worth breaking thus out into a separate…
// mode is available. When returning true all parameters are updated to reflect		// mode is available. When returning true all parameters are updated to reflect
// their recommended values.		// their recommended values.
static bool findMoreOptimalIndexType(const MaskedGatherScatterSDNode *N,		static bool findMoreOptimalIndexType(const MaskedGatherScatterSDNode *N,
		peterwaller-armUnsubmitted Not Done Reply Inline Actions s/.getVectorElementType().getFixedSizeInBits()/.getScalarSizeInBits()/ peterwaller-arm: s/.getVectorElementType().getFixedSizeInBits()/.getScalarSizeInBits()/
SDValue &BasePtr, SDValue &Index,		SDValue &BasePtr, SDValue &Index,
SelectionDAG &DAG) {		SelectionDAG &DAG) {
// Try to iteratively fold parts of the index into the base pointer to		// Try to iteratively fold parts of the index into the base pointer to
// simplify the index as much as possible.		// simplify the index as much as possible.
		DavidTrubyUnsubmitted Not Done Reply Inline Actions Could you just have the following and remove one level of nesting? if (Index.getOpcode() == ISD::ZERO_EXTEND && tryNarrowZExtGatherIndex(N, Index, DAG)) return true; DavidTruby: Could you just have the following and remove one level of nesting? ``` if (Index.getOpcode()…
bool Changed = false;		bool Changed = false;
while (foldIndexIntoBase(BasePtr, Index, N->getScale(), SDLoc(N), DAG))		while (foldIndexIntoBase(BasePtr, Index, N->getScale(), SDLoc(N), DAG))
Changed = true;		Changed = true;

// Only consider element types that are pointer sized as smaller types can		// Only consider element types that are pointer sized as smaller types can
// be easily promoted.		// be easily promoted.
EVT IndexVT = Index.getValueType();		EVT IndexVT = Index.getValueType();
if (IndexVT.getVectorElementType() != MVT::i64 \|\| IndexVT == MVT::nxv2i64)		if (IndexVT.getVectorElementType() != MVT::i64 \|\| IndexVT == MVT::nxv2i64)
return Changed;		return Changed;

// Can indices be trivially shrunk?		// Can indices be trivially shrunk?
if (ISD::isVectorShrinkable(Index.getNode(), 32, N->isIndexSigned())) {		EVT DataVT = N->getOperand(1).getValueType();
		if (ISD::isVectorShrinkable(Index.getNode(), 32, N->isIndexSigned()) &&
		!(DataVT.getScalarSizeInBits() == 64 && DataVT.isFixedLengthVector())) {
		paulwalker-armUnsubmitted Not Done Reply Inline Actions Please pull this out into a separate check before this block, along with a suitable comment. The "fix" is not really related to `isVectorShrinkable`, it is just that today that is the only logic applicable for fixed length vectors. However, this might change in the future hence why I prefer an isolated check. paulwalker-arm: Please pull this out into a separate check before this block, along with a suitable comment.
		MattDevereauAuthorUnsubmitted Done Reply Inline Actions I'm not sure if I'm missing something obvious here but I don't understand what you're expecting this to look like when this check is seperated. The way this function's logic is structured means it's difficult to bail out cleanly from individual checks here, as we want to fall through this if block instead of returning false. I could use a nested if block, e.g. //comment if (!(DataVT.getScalarSizeInBits() == 64 && DataVT.isFixedLengthVector())){ if (!ISD::isVectorShrinkable... ){ ... } } To sort of separate the "temporary" condition, but I believe it still needs to be `and` logic for the correct behaviour here Could you please elaborate on what you're expecting? MattDevereau: I'm not sure if I'm missing something obvious here but I don't understand what you're expecting…
		paulwalker-armUnsubmitted Not Done Reply Inline Actions I was thinking of: // Fixed length vectors are always "packed" so there's no value in the index having a smaller element type than the data. if (DataVT.isFixedLengthVector() && DataVT.getScalarSizeInBits() == 64) return Changed; It is my assertion that we don't want to fall through because all code after this point is trying to rewrite `Index` to be a vector of i32s, which is never going to be good for fixed length vector types because `Index` will just get re-extended during operation legalisation. paulwalker-arm: I was thinking of: ``` // Fixed length vectors are always "packed" so there's no value in the…
EVT NewIndexVT = IndexVT.changeVectorElementType(MVT::i32);		EVT NewIndexVT = IndexVT.changeVectorElementType(MVT::i32);
Index = DAG.getNode(ISD::TRUNCATE, SDLoc(N), NewIndexVT, Index);		Index = DAG.getNode(ISD::TRUNCATE, SDLoc(N), NewIndexVT, Index);
return true;		return true;
}		}

// Match:		// Match:
// Index = step(const)		// Index = step(const)
int64_t Stride = 0;		int64_t Stride = 0;
▲ Show 20 Lines • Show All 4,353 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll

Show First 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	; CHECK-NEXT: ret
call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %load,		call void @llvm.masked.store.nxv2i8(<vscale x 2 x i8> %load,
<vscale x 2 x i8> *%res_out,		<vscale x 2 x i8> *%res_out,
i32 8,		i32 8,
<vscale x 2 x i1> %pred)		<vscale x 2 x i1> %pred)

ret <vscale x 2 x i64> %res		ret <vscale x 2 x i64> %res
}		}

define <vscale x 16 x i8> @narrow_i64_gather_index_i8(i8* %out, i8* %in, <vscale x 16 x i8> %d, i64 %ptr){		define <vscale x 16 x i8> @narrow_i64_gather_index_i8_zext(i8* %out, i8* %in, <vscale x 16 x i8> %d, i64 %ptr){
; CHECK-LABEL: narrow_i64_gather_index_i8:		; CHECK-LABEL: narrow_i64_gather_index_i8_zext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add x8, x1, x2		; CHECK-NEXT: add x8, x1, x2
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1b { z0.d }, p0/z, [x1, x2]		; CHECK-NEXT: ld1b { z0.s }, p0/z, [x1, x2]
; CHECK-NEXT: ld1b { z1.d }, p0/z, [x8, #1, mul vl]		; CHECK-NEXT: ld1b { z1.s }, p0/z, [x8, #1, mul vl]
; CHECK-NEXT: ld1b { z2.d }, p0/z, [x8, #2, mul vl]		; CHECK-NEXT: ld1b { z2.s }, p0/z, [x8, #2, mul vl]
; CHECK-NEXT: ld1b { z3.d }, p0/z, [x8, #3, mul vl]		; CHECK-NEXT: ld1b { z3.s }, p0/z, [x8, #3, mul vl]
; CHECK-NEXT: ld1b { z4.d }, p0/z, [x8, #4, mul vl]		; CHECK-NEXT: ld1b { z3.s }, p0/z, [x1, z3.s, uxtw]
; CHECK-NEXT: ld1b { z5.d }, p0/z, [x8, #5, mul vl]		; CHECK-NEXT: ld1b { z2.s }, p0/z, [x1, z2.s, uxtw]
; CHECK-NEXT: ld1b { z6.d }, p0/z, [x8, #6, mul vl]		; CHECK-NEXT: ld1b { z0.s }, p0/z, [x1, z0.s, uxtw]
; CHECK-NEXT: ld1b { z7.d }, p0/z, [x8, #7, mul vl]		; CHECK-NEXT: ld1b { z1.s }, p0/z, [x1, z1.s, uxtw]
; CHECK-NEXT: ld1b { z7.d }, p0/z, [x1, z7.d]		; CHECK-NEXT: uzp1 z2.h, z2.h, z3.h
; CHECK-NEXT: ld1b { z6.d }, p0/z, [x1, z6.d]		; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ld1b { z5.d }, p0/z, [x1, z5.d]		; CHECK-NEXT: uzp1 z0.b, z0.b, z2.b
; CHECK-NEXT: ld1b { z4.d }, p0/z, [x1, z4.d]
; CHECK-NEXT: ld1b { z3.d }, p0/z, [x1, z3.d]
; CHECK-NEXT: ld1b { z2.d }, p0/z, [x1, z2.d]
; CHECK-NEXT: ld1b { z0.d }, p0/z, [x1, z0.d]
; CHECK-NEXT: ld1b { z1.d }, p0/z, [x1, z1.d]
; CHECK-NEXT: uzp1 z6.s, z6.s, z7.s
; CHECK-NEXT: uzp1 z4.s, z4.s, z5.s
; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: uzp1 z1.h, z4.h, z6.h
; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
; CHECK-NEXT: uzp1 z0.b, z0.b, z1.b
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = getelementptr inbounds i8, i8* %in, i64 %ptr		%1 = getelementptr inbounds i8, i8* %in, i64 %ptr
%2 = bitcast i8* %1 to <vscale x 16 x i8>*		%2 = bitcast i8* %1 to <vscale x 16 x i8>*
%wide.load = load <vscale x 16 x i8>, <vscale x 16 x i8>* %2, align 1		%wide.load = load <vscale x 16 x i8>, <vscale x 16 x i8>* %2, align 1
%3 = zext <vscale x 16 x i8> %wide.load to <vscale x 16 x i64>		%3 = zext <vscale x 16 x i8> %wide.load to <vscale x 16 x i64>
%4 = getelementptr inbounds i8, i8* %in, <vscale x 16 x i64> %3		%4 = getelementptr inbounds i8, i8* %in, <vscale x 16 x i64> %3
%wide.masked.gather = call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x i8*> %4, i32 1, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i32 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> undef)		%wide.masked.gather = call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x i8*> %4, i32 1, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i32 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> undef)
ret <vscale x 16 x i8> %wide.masked.gather		ret <vscale x 16 x i8> %wide.masked.gather
}		}

define <vscale x 8 x i16> @narrow_i64_gather_index_i16(i16* %out, i16* %in, <vscale x 8 x i16> %d, i64 %ptr){		define <vscale x 16 x i8> @narrow_i64_gather_index_i8_sext(i8* %out, i8* %in, <vscale x 16 x i8> %d, i64 %ptr){
; CHECK-LABEL: narrow_i64_gather_index_i16:		; CHECK-LABEL: narrow_i64_gather_index_i8_sext:
		; CHECK: // %bb.0:
		; CHECK-NEXT: add x8, x1, x2
		; CHECK-NEXT: ptrue p0.s
		; CHECK-NEXT: ld1sb { z0.s }, p0/z, [x1, x2]
		; CHECK-NEXT: ld1sb { z1.s }, p0/z, [x8, #1, mul vl]
		; CHECK-NEXT: ld1sb { z2.s }, p0/z, [x8, #2, mul vl]
		; CHECK-NEXT: ld1sb { z3.s }, p0/z, [x8, #3, mul vl]
		; CHECK-NEXT: ld1b { z3.s }, p0/z, [x1, z3.s, sxtw]
		; CHECK-NEXT: ld1b { z2.s }, p0/z, [x1, z2.s, sxtw]
		; CHECK-NEXT: ld1b { z0.s }, p0/z, [x1, z0.s, sxtw]
		; CHECK-NEXT: ld1b { z1.s }, p0/z, [x1, z1.s, sxtw]
		; CHECK-NEXT: uzp1 z2.h, z2.h, z3.h
		; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
		; CHECK-NEXT: uzp1 z0.b, z0.b, z2.b
		; CHECK-NEXT: ret
		%1 = getelementptr inbounds i8, i8* %in, i64 %ptr
		%2 = bitcast i8* %1 to <vscale x 16 x i8>*
		%wide.load = load <vscale x 16 x i8>, <vscale x 16 x i8>* %2, align 1
		%3 = sext <vscale x 16 x i8> %wide.load to <vscale x 16 x i64>
		%4 = getelementptr inbounds i8, i8* %in, <vscale x 16 x i64> %3
		%wide.masked.gather = call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x i8*> %4, i32 1, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i32 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> undef)
		ret <vscale x 16 x i8> %wide.masked.gather
		}

		define <vscale x 8 x i16> @narrow_i64_gather_index_i16_zext(i16* %out, i16* %in, <vscale x 8 x i16> %d, i64 %ptr){
		; CHECK-LABEL: narrow_i64_gather_index_i16_zext:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: add x8, x1, x2, lsl #1		; CHECK-NEXT: add x8, x1, x2, lsl #1
; CHECK-NEXT: ptrue p0.d		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1h { z0.d }, p0/z, [x1, x2, lsl #1]		; CHECK-NEXT: ld1h { z0.s }, p0/z, [x1, x2, lsl #1]
; CHECK-NEXT: ld1h { z1.d }, p0/z, [x8, #1, mul vl]		; CHECK-NEXT: ld1h { z1.s }, p0/z, [x8, #1, mul vl]
; CHECK-NEXT: ld1h { z2.d }, p0/z, [x8, #2, mul vl]		; CHECK-NEXT: ld1h { z0.s }, p0/z, [x1, z0.s, uxtw #1]
; CHECK-NEXT: ld1h { z3.d }, p0/z, [x8, #3, mul vl]		; CHECK-NEXT: ld1h { z1.s }, p0/z, [x1, z1.s, uxtw #1]
; CHECK-NEXT: ld1h { z3.d }, p0/z, [x1, z3.d, lsl #1]		; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
; CHECK-NEXT: ld1h { z2.d }, p0/z, [x1, z2.d, lsl #1]
; CHECK-NEXT: ld1h { z0.d }, p0/z, [x1, z0.d, lsl #1]
; CHECK-NEXT: ld1h { z1.d }, p0/z, [x1, z1.d, lsl #1]
; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = getelementptr inbounds i16, i16* %in, i64 %ptr		%1 = getelementptr inbounds i16, i16* %in, i64 %ptr
%2 = bitcast i16* %1 to <vscale x 8 x i16>*		%2 = bitcast i16* %1 to <vscale x 8 x i16>*
%wide.load = load <vscale x 8 x i16>, <vscale x 8 x i16>* %2, align 1		%wide.load = load <vscale x 8 x i16>, <vscale x 8 x i16>* %2, align 1
%3 = zext <vscale x 8 x i16> %wide.load to <vscale x 8 x i64>		%3 = zext <vscale x 8 x i16> %wide.load to <vscale x 8 x i64>
%4 = getelementptr inbounds i16, i16* %in, <vscale x 8 x i64> %3		%4 = getelementptr inbounds i16, i16* %in, <vscale x 8 x i64> %3
%wide.masked.gather = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0(<vscale x 8 x i16*> %4, i32 1, <vscale x 8 x i1> shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i32 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i16> undef)		%wide.masked.gather = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0(<vscale x 8 x i16*> %4, i32 1, <vscale x 8 x i1> shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i32 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i16> undef)
ret <vscale x 8 x i16> %wide.masked.gather		ret <vscale x 8 x i16> %wide.masked.gather
}		}

		define <vscale x 8 x i16> @narrow_i64_gather_index_i16_sext(i16* %out, i16* %in, <vscale x 8 x i16> %d, i64 %ptr){
		; CHECK-LABEL: narrow_i64_gather_index_i16_sext:
		; CHECK: // %bb.0:
		; CHECK-NEXT: add x8, x1, x2, lsl #1
		; CHECK-NEXT: ptrue p0.s
		; CHECK-NEXT: ld1sh { z0.s }, p0/z, [x1, x2, lsl #1]
		; CHECK-NEXT: ld1sh { z1.s }, p0/z, [x8, #1, mul vl]
		; CHECK-NEXT: ld1h { z0.s }, p0/z, [x1, z0.s, sxtw #1]
		; CHECK-NEXT: ld1h { z1.s }, p0/z, [x1, z1.s, sxtw #1]
		; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
		; CHECK-NEXT: ret
		%1 = getelementptr inbounds i16, i16* %in, i64 %ptr
		%2 = bitcast i16* %1 to <vscale x 8 x i16>*
		%wide.load = load <vscale x 8 x i16>, <vscale x 8 x i16>* %2, align 1
		%3 = sext <vscale x 8 x i16> %wide.load to <vscale x 8 x i64>
		%4 = getelementptr inbounds i16, i16* %in, <vscale x 8 x i64> %3
		%wide.masked.gather = call <vscale x 8 x i16> @llvm.masked.gather.nxv8i16.nxv8p0(<vscale x 8 x i16*> %4, i32 1, <vscale x 8 x i1> shufflevector (<vscale x 8 x i1> insertelement (<vscale x 8 x i1> poison, i1 true, i32 0), <vscale x 8 x i1> poison, <vscale x 8 x i32> zeroinitializer), <vscale x 8 x i16> undef)
		ret <vscale x 8 x i16> %wide.masked.gather
		}

define <vscale x 4 x i32> @no_narrow_i64_gather_index_i32(i32* %out, i32* %in, <vscale x 4 x i32> %d, i64 %ptr){		define <vscale x 4 x i32> @no_narrow_i64_gather_index_i32(i32* %out, i32* %in, <vscale x 4 x i32> %d, i64 %ptr){
; CHECK-LABEL: no_narrow_i64_gather_index_i32:		; CHECK-LABEL: no_narrow_i64_gather_index_i32:
; CHECK: // %bb.0:		; CHECK: // %bb.0:
; CHECK-NEXT: ptrue p0.s		; CHECK-NEXT: ptrue p0.s
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1, x2, lsl #2]		; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1, x2, lsl #2]
; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1, z0.s, uxtw #2]		; CHECK-NEXT: ld1w { z0.s }, p0/z, [x1, z0.s, uxtw #2]
; CHECK-NEXT: ret		; CHECK-NEXT: ret
%1 = getelementptr inbounds i32, i32* %in, i64 %ptr		%1 = getelementptr inbounds i32, i32* %in, i64 %ptr
Show All 29 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457587

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll

This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] Expand gather index to 32 bits instead of 64 bitsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 457587

llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

llvm/test/CodeGen/AArch64/sve-gather-scatter-dag-combine.ll

[AArch64][SVE] Expand gather index to 32 bits instead of 64 bits
ClosedPublic