This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/AArch64/
-
Target/
-
AArch64/
1/1
AArch64ISelLowering.cpp
-
AArch64SVEInstrInfo.td
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
1/1
sve-masked-gather-legalize.ll

Differential D94171

[SVE][CodeGen] Fix legalisation of floating-point masked gathers
ClosedPublic

Authored by kmclaughlin on Jan 6 2021, 5:32 AM.

Download Raw Diff

Details

Reviewers

sdesmalen
david-arm
efriedma

Commits

rGc37f68a8885c: [SVE][CodeGen] Fix legalisation of floating-point masked gathers

Summary

Changes in this patch:

When lowering floating-point masked gathers, cast the result of the gather back to the original type with reinterpret_cast before returning.
Added patterns for reinterpret_casts from integer to floating point, and concat_vector patterns for bfloat16.
Tests for various legalisation scenarios with floating point types.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kmclaughlin created this revision.Jan 6 2021, 5:32 AM

Herald added subscribers: NickHung, psnobl, hiraditya, tschuett. · View Herald TranscriptJan 6 2021, 5:32 AM

kmclaughlin requested review of this revision.Jan 6 2021, 5:32 AM

Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2021, 5:32 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

david-arm added inline comments.Jan 6 2021, 6:13 AM

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
3981	Is it worth having a helper function here, something like "isVectorUnpack(bool Signed)"? The reason I mention this is that there are two other places in the codebase where we also check if an opcode is "AArch64ISD::UUNPKLO \|\| IdxOp == AArch64ISD::UUNPKHI".
llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll
74	Is it worth having tests that load <vscale x 4 x half> as well for both the ptrs and base+offset case?

Harbormaster completed remote builds in B84193: Diff 314867.Jan 6 2021, 6:21 AM

Added a new helper function, isVectorUnpack
Added tests which load <vscale x 4 x half> & <vscale x 2 x float>

LGTM! Thanks for making the changes.

This revision is now accepted and ready to land.Jan 7 2021, 5:11 AM

Removed the isVectorUnpack helper added in the previous revision. If the index values are already extended to i64 by an unpkhi/lo, then the gather does not also need to extend the index.
This affects the masked_gather_nxv4f64 test, which has been updated as follows:

sunpklo z1.d, z0.s
sunpkhi z2.d, z0.s
ld1d { z0.d }, p1/z, [x0, z1.d, sxtw #3]
ld1d { z1.d }, p0/z, [x0, z2.d, sxtw #3]

sunpklo z1.d, z0.s
sunpkhi z2.d, z0.s
ld1d { z0.d }, p1/z, [x0, z1.d, lsl #3]
ld1d { z1.d }, p0/z, [x0, z2.d, lsl #3]

LGTM! I'd blame the reviewer for the bug in the previous patch. :)

sdesmalen accepted this revision.Jan 8 2021, 8:13 AM

Closed by commit rGc37f68a8885c: [SVE][CodeGen] Fix legalisation of floating-point masked gathers (authored by kmclaughlin). · Explain WhyJan 11 2021, 3:28 AM

This revision was automatically updated to reflect the committed changes.

kmclaughlin added a commit: rGc37f68a8885c: [SVE][CodeGen] Fix legalisation of floating-point masked gathers.

Revision Contents

Path

Size

llvm/

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

11 lines

AArch64SVEInstrInfo.td

14 lines

test/

CodeGen/

AArch64/

sve-masked-gather-legalize.ll

106 lines

Diff 315743

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,161 Lines • ▼ Show 20 Lines	for (auto VT : {MVT::nxv2f16, MVT::nxv4f16, MVT::nxv8f16, MVT::nxv2f32,
setOperationAction(ISD::FP_ROUND, VT, Custom);		setOperationAction(ISD::FP_ROUND, VT, Custom);
setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_FADD, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMAX, VT, Custom);
setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);		setOperationAction(ISD::VECREDUCE_FMIN, VT, Custom);
setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);		setOperationAction(ISD::VECREDUCE_SEQ_FADD, VT, Custom);
}		}

for (auto VT : {MVT::nxv2bf16, MVT::nxv4bf16, MVT::nxv8bf16}) {		for (auto VT : {MVT::nxv2bf16, MVT::nxv4bf16, MVT::nxv8bf16}) {
		setOperationAction(ISD::CONCAT_VECTORS, VT, Custom);
setOperationAction(ISD::MGATHER, VT, Custom);		setOperationAction(ISD::MGATHER, VT, Custom);
setOperationAction(ISD::MSCATTER, VT, Custom);		setOperationAction(ISD::MSCATTER, VT, Custom);
}		}

setOperationAction(ISD::SPLAT_VECTOR, MVT::nxv8bf16, Custom);		setOperationAction(ISD::SPLAT_VECTOR, MVT::nxv8bf16, Custom);

setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i8, Custom);
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);		setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i16, Custom);
▲ Show 20 Lines • Show All 2,794 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerMGATHER(SDValue Op,

ISD::MemIndexType IndexType = MGT->getIndexType();		ISD::MemIndexType IndexType = MGT->getIndexType();
bool IsScaled =		bool IsScaled =
IndexType == ISD::SIGNED_SCALED \|\| IndexType == ISD::UNSIGNED_SCALED;		IndexType == ISD::SIGNED_SCALED \|\| IndexType == ISD::UNSIGNED_SCALED;
bool IsSigned =		bool IsSigned =
IndexType == ISD::SIGNED_SCALED \|\| IndexType == ISD::SIGNED_UNSCALED;		IndexType == ISD::SIGNED_SCALED \|\| IndexType == ISD::SIGNED_UNSCALED;
bool IdxNeedsExtend =		bool IdxNeedsExtend =
getGatherScatterIndexIsExtended(Index) \|\|		getGatherScatterIndexIsExtended(Index) \|\|
Index.getSimpleValueType().getVectorElementType() == MVT::i32;		Index.getSimpleValueType().getVectorElementType() == MVT::i32;
		david-armUnsubmitted Done Reply Inline Actions Is it worth having a helper function here, something like "isVectorUnpack(bool Signed)"? The reason I mention this is that there are two other places in the codebase where we also check if an opcode is "AArch64ISD::UUNPKLO \|\| IdxOp == AArch64ISD::UUNPKHI". david-arm: Is it worth having a helper function here, something like "isVectorUnpack(bool Signed)"? The…
bool ResNeedsSignExtend = ExtTy == ISD::EXTLOAD \|\| ExtTy == ISD::SEXTLOAD;		bool ResNeedsSignExtend = ExtTy == ISD::EXTLOAD \|\| ExtTy == ISD::SEXTLOAD;

EVT VT = PassThru.getSimpleValueType();		EVT VT = PassThru.getSimpleValueType();
EVT MemVT = MGT->getMemoryVT();		EVT MemVT = MGT->getMemoryVT();
SDValue InputVT = DAG.getValueType(MemVT);		SDValue InputVT = DAG.getValueType(MemVT);

if (VT.getVectorElementType() == MVT::bf16 &&		if (VT.getVectorElementType() == MVT::bf16 &&
!static_cast<const AArch64Subtarget &>(DAG.getSubtarget()).hasBF16())		!static_cast<const AArch64Subtarget &>(DAG.getSubtarget()).hasBF16())
return SDValue();		return SDValue();

// Handle FP data		// Handle FP data
if (VT.isFloatingPoint()) {		if (VT.isFloatingPoint()) {
VT = VT.changeVectorElementTypeToInteger();
ElementCount EC = VT.getVectorElementCount();		ElementCount EC = VT.getVectorElementCount();
auto ScalarIntVT =		auto ScalarIntVT =
MVT::getIntegerVT(AArch64::SVEBitsPerBlock / EC.getKnownMinValue());		MVT::getIntegerVT(AArch64::SVEBitsPerBlock / EC.getKnownMinValue());
PassThru = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL,		PassThru = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL,
MVT::getVectorVT(ScalarIntVT, EC), PassThru);		MVT::getVectorVT(ScalarIntVT, EC), PassThru);

InputVT = DAG.getValueType(MemVT.changeVectorElementTypeToInteger());		InputVT = DAG.getValueType(MemVT.changeVectorElementTypeToInteger());
}		}

SDVTList VTs = DAG.getVTList(PassThru.getSimpleValueType(), MVT::Other);		SDVTList VTs = DAG.getVTList(PassThru.getSimpleValueType(), MVT::Other);

if (getGatherScatterIndexIsExtended(Index))		if (getGatherScatterIndexIsExtended(Index))
Index = Index.getOperand(0);		Index = Index.getOperand(0);

unsigned Opcode = getGatherVecOpcode(IsScaled, IsSigned, IdxNeedsExtend);		unsigned Opcode = getGatherVecOpcode(IsScaled, IsSigned, IdxNeedsExtend);
selectGatherScatterAddrMode(BasePtr, Index, MemVT, Opcode,		selectGatherScatterAddrMode(BasePtr, Index, MemVT, Opcode,
/isGather=/true, DAG);		/isGather=/true, DAG);

if (ResNeedsSignExtend)		if (ResNeedsSignExtend)
Opcode = getSignExtendedGatherOpcode(Opcode);		Opcode = getSignExtendedGatherOpcode(Opcode);

SDValue Ops[] = {Chain, Mask, BasePtr, Index, InputVT, PassThru};		SDValue Ops[] = {Chain, Mask, BasePtr, Index, InputVT, PassThru};
return DAG.getNode(Opcode, DL, VTs, Ops);		SDValue Gather = DAG.getNode(Opcode, DL, VTs, Ops);

		if (VT.isFloatingPoint()) {
		SDValue Cast = DAG.getNode(AArch64ISD::REINTERPRET_CAST, DL, VT, Gather);
		return DAG.getMergeValues({Cast, Gather}, DL);
		}

		return Gather;
}		}

SDValue AArch64TargetLowering::LowerMSCATTER(SDValue Op,		SDValue AArch64TargetLowering::LowerMSCATTER(SDValue Op,
SelectionDAG &DAG) const {		SelectionDAG &DAG) const {
SDLoc DL(Op);		SDLoc DL(Op);
MaskedScatterSDNode *MSC = cast<MaskedScatterSDNode>(Op);		MaskedScatterSDNode *MSC = cast<MaskedScatterSDNode>(Op);
assert(MSC && "Can only custom lower scatter store nodes");		assert(MSC && "Can only custom lower scatter store nodes");

▲ Show 20 Lines • Show All 13,129 Lines • Show Last 20 Lines

llvm/lib/Target/AArch64/AArch64SVEInstrInfo.td

Show First 20 Lines • Show All 1,177 Lines • ▼ Show 20 Lines	let Predicates = [HasSVE] in {

// Concatenate two floating point vectors.		// Concatenate two floating point vectors.
def : Pat<(nxv4f16 (concat_vectors nxv2f16:$v1, nxv2f16:$v2)),		def : Pat<(nxv4f16 (concat_vectors nxv2f16:$v1, nxv2f16:$v2)),
(UZP1_ZZZ_S $v1, $v2)>;		(UZP1_ZZZ_S $v1, $v2)>;
def : Pat<(nxv8f16 (concat_vectors nxv4f16:$v1, nxv4f16:$v2)),		def : Pat<(nxv8f16 (concat_vectors nxv4f16:$v1, nxv4f16:$v2)),
(UZP1_ZZZ_H $v1, $v2)>;		(UZP1_ZZZ_H $v1, $v2)>;
def : Pat<(nxv4f32 (concat_vectors nxv2f32:$v1, nxv2f32:$v2)),		def : Pat<(nxv4f32 (concat_vectors nxv2f32:$v1, nxv2f32:$v2)),
(UZP1_ZZZ_S $v1, $v2)>;		(UZP1_ZZZ_S $v1, $v2)>;
		def : Pat<(nxv4bf16 (concat_vectors nxv2bf16:$v1, nxv2bf16:$v2)),
		(UZP1_ZZZ_S $v1, $v2)>;
		def : Pat<(nxv8bf16 (concat_vectors nxv4bf16:$v1, nxv4bf16:$v2)),
		(UZP1_ZZZ_H $v1, $v2)>;

defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;		defm CMPHS_PPzZZ : sve_int_cmp_0<0b000, "cmphs", SETUGE, SETULE>;
defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;		defm CMPHI_PPzZZ : sve_int_cmp_0<0b001, "cmphi", SETUGT, SETULT>;
defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;		defm CMPGE_PPzZZ : sve_int_cmp_0<0b100, "cmpge", SETGE, SETLE>;
defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;		defm CMPGT_PPzZZ : sve_int_cmp_0<0b101, "cmpgt", SETGT, SETLT>;
defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;		defm CMPEQ_PPzZZ : sve_int_cmp_0<0b110, "cmpeq", SETEQ, SETEQ>;
defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;		defm CMPNE_PPzZZ : sve_int_cmp_0<0b111, "cmpne", SETNE, SETNE>;

▲ Show 20 Lines • Show All 537 Lines • ▼ Show 20 Lines
def : Pat<(nxv2i64 (reinterpret_cast (nxv2f64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv2i64 (reinterpret_cast (nxv2f64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv2i64 (reinterpret_cast (nxv2f32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv2i64 (reinterpret_cast (nxv2f32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv2i64 (reinterpret_cast (nxv2f16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv2i64 (reinterpret_cast (nxv2f16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv4i32 (reinterpret_cast (nxv4f32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv4i32 (reinterpret_cast (nxv4f32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv4i32 (reinterpret_cast (nxv4f16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv4i32 (reinterpret_cast (nxv4f16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv2i64 (reinterpret_cast (nxv2bf16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv2i64 (reinterpret_cast (nxv2bf16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
def : Pat<(nxv4i32 (reinterpret_cast (nxv4bf16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;		def : Pat<(nxv4i32 (reinterpret_cast (nxv4bf16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;

		def : Pat<(nxv2f16 (reinterpret_cast (nxv2i64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv2f32 (reinterpret_cast (nxv2i64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv2f64 (reinterpret_cast (nxv2i64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv4f16 (reinterpret_cast (nxv4i32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv4f32 (reinterpret_cast (nxv4i32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv8f16 (reinterpret_cast (nxv8i16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv2bf16 (reinterpret_cast (nxv2i64 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv4bf16 (reinterpret_cast (nxv4i32 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;
		def : Pat<(nxv8bf16 (reinterpret_cast (nxv8i16 ZPR:$src))), (COPY_TO_REGCLASS ZPR:$src, ZPR)>;

def : Pat<(nxv16i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv16i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_B 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_B 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv8i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv8i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_H 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_H 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv4i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv4i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_S 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_S 31), PPR:$Ps1, PPR:$Ps2)>;
def : Pat<(nxv2i1 (and PPR:$Ps1, PPR:$Ps2)),		def : Pat<(nxv2i1 (and PPR:$Ps1, PPR:$Ps2)),
(AND_PPzPP (PTRUE_D 31), PPR:$Ps1, PPR:$Ps2)>;		(AND_PPzPP (PTRUE_D 31), PPR:$Ps1, PPR:$Ps2)>;
▲ Show 20 Lines • Show All 999 Lines • Show Last 20 Lines

llvm/test/CodeGen/AArch64/sve-masked-gather-legalize.ll

	Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines
	define <vscale x 2 x i32> @masked_gather_nxv2i32(<vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask) {			define <vscale x 2 x i32> @masked_gather_nxv2i32(<vscale x 2 x i32*> %ptrs, <vscale x 2 x i1> %mask) {
	; CHECK-LABEL: masked_gather_nxv2i32:			; CHECK-LABEL: masked_gather_nxv2i32:
	; CHECK: ld1sw { z0.d }, p0/z, [z0.d]			; CHECK: ld1sw { z0.d }, p0/z, [z0.d]
	; CHECK: ret			; CHECK: ret
	%data = call <vscale x 2 x i32> @llvm.masked.gather.nxv2i32(<vscale x 2 x i32*> %ptrs, i32 4, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)			%data = call <vscale x 2 x i32> @llvm.masked.gather.nxv2i32(<vscale x 2 x i32*> %ptrs, i32 4, <vscale x 2 x i1> %mask, <vscale x 2 x i32> undef)
	ret <vscale x 2 x i32> %data			ret <vscale x 2 x i32> %data
	}			}

				define <vscale x 4 x half> @masked_gather_nxv4f16(<vscale x 4 x half*> %ptrs, <vscale x 4 x i1> %mask) {
				david-armUnsubmitted Done Reply Inline Actions Is it worth having tests that load <vscale x 4 x half> as well for both the ptrs and base+offset case? david-arm: Is it worth having tests that load <vscale x 4 x half> as well for both the ptrs and…
				; CHECK-LABEL: masked_gather_nxv4f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip2 p2.s, p0.s, p1.s
				; CHECK-NEXT: zip1 p0.s, p0.s, p1.s
				; CHECK-NEXT: ld1h { z1.d }, p2/z, [z1.d]
				; CHECK-NEXT: ld1h { z0.d }, p0/z, [z0.d]
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: ret
				%data = call <vscale x 4 x half> @llvm.masked.gather.nxv4f16(<vscale x 4 x half*> %ptrs, i32 0, <vscale x 4 x i1> %mask, <vscale x 4 x half> undef)
				ret <vscale x 4 x half> %data
				}

				define <vscale x 2 x float> @masked_gather_nxv2f32(float* %base, <vscale x 2 x i16> %indices, <vscale x 2 x i1> %mask) {
				; CHECK-LABEL: masked_gather_nxv2f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p1.d
				; CHECK-NEXT: sxth z0.d, p1/m, z0.d
				; CHECK-NEXT: ld1w { z0.d }, p0/z, [x0, z0.d, sxtw #2]
				; CHECK-NEXT: ret
				%ptrs = getelementptr float, float* %base, <vscale x 2 x i16> %indices
				%data = call <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*> %ptrs, i32 1, <vscale x 2 x i1> %mask, <vscale x 2 x float> undef)
				ret <vscale x 2 x float> %data
				}

				define <vscale x 8 x half> @masked_gather_nxv8f16(<vscale x 8 x half*> %ptrs, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_gather_nxv8f16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip2 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p0.h, p0.h, p1.h
				; CHECK-NEXT: zip2 p3.s, p2.s, p1.s
				; CHECK-NEXT: zip1 p2.s, p2.s, p1.s
				; CHECK-NEXT: ld1h { z3.d }, p3/z, [z3.d]
				; CHECK-NEXT: ld1h { z2.d }, p2/z, [z2.d]
				; CHECK-NEXT: zip2 p2.s, p0.s, p1.s
				; CHECK-NEXT: zip1 p0.s, p0.s, p1.s
				; CHECK-NEXT: ld1h { z1.d }, p2/z, [z1.d]
				; CHECK-NEXT: ld1h { z0.d }, p0/z, [z0.d]
				; CHECK-NEXT: uzp1 z2.s, z2.s, z3.s
				; CHECK-NEXT: uzp1 z0.s, z0.s, z1.s
				; CHECK-NEXT: uzp1 z0.h, z0.h, z2.h
				; CHECK-NEXT: ret
				%data = call <vscale x 8 x half> @llvm.masked.gather.nxv8f16(<vscale x 8 x half*> %ptrs, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x half> undef)
				ret <vscale x 8 x half> %data
				}

				define <vscale x 8 x bfloat> @masked_gather_nxv8bf16(bfloat* %base, <vscale x 8 x i16> %indices, <vscale x 8 x i1> %mask) #0 {
				; CHECK-LABEL: masked_gather_nxv8bf16:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: sunpkhi z1.s, z0.h
				; CHECK-NEXT: sunpklo z0.s, z0.h
				; CHECK-NEXT: zip2 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip1 p0.h, p0.h, p1.h
				; CHECK-NEXT: ld1h { z1.s }, p2/z, [x0, z1.s, sxtw #1]
				; CHECK-NEXT: ld1h { z0.s }, p0/z, [x0, z0.s, sxtw #1]
				; CHECK-NEXT: uzp1 z0.h, z0.h, z1.h
				; CHECK-NEXT: ret
				%ptrs = getelementptr bfloat, bfloat* %base, <vscale x 8 x i16> %indices
				%data = call <vscale x 8 x bfloat> @llvm.masked.gather.nxv8bf16(<vscale x 8 x bfloat*> %ptrs, i32 1, <vscale x 8 x i1> %mask, <vscale x 8 x bfloat> undef)
				ret <vscale x 8 x bfloat> %data
				}

				define <vscale x 4 x double> @masked_gather_nxv4f64(double* %base, <vscale x 4 x i16> %indices, <vscale x 4 x i1> %mask) {;
				; CHECK-LABEL: masked_gather_nxv4f64:
				; CHECK: // %bb.0:
				; CHECK-NEXT: ptrue p1.s
				; CHECK-NEXT: pfalse p2.b
				; CHECK-NEXT: sxth z0.s, p1/m, z0.s
				; CHECK-NEXT: zip1 p1.s, p0.s, p2.s
				; CHECK-NEXT: zip2 p0.s, p0.s, p2.s
				; CHECK-NEXT: sunpklo z1.d, z0.s
				; CHECK-NEXT: sunpkhi z2.d, z0.s
				; CHECK-NEXT: ld1d { z0.d }, p1/z, [x0, z1.d, lsl #3]
				; CHECK-NEXT: ld1d { z1.d }, p0/z, [x0, z2.d, lsl #3]
				; CHECK-NEXT: ret
				%ptrs = getelementptr double, double* %base, <vscale x 4 x i16> %indices
				%data = call <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*> %ptrs, i32 1, <vscale x 4 x i1> %mask, <vscale x 4 x double> undef)
				ret <vscale x 4 x double> %data
				}

				define <vscale x 8 x float> @masked_gather_nxv8f32(float* %base, <vscale x 8 x i32> %offsets, <vscale x 8 x i1> %mask) {
				; CHECK-LABEL: masked_gather_nxv8f32:
				; CHECK: // %bb.0:
				; CHECK-NEXT: pfalse p1.b
				; CHECK-NEXT: zip1 p2.h, p0.h, p1.h
				; CHECK-NEXT: zip2 p0.h, p0.h, p1.h
				; CHECK-NEXT: ld1w { z0.s }, p2/z, [x0, z0.s, uxtw #2]
				; CHECK-NEXT: ld1w { z1.s }, p0/z, [x0, z1.s, uxtw #2]
				; CHECK-NEXT: ret
				%offsets.zext = zext <vscale x 8 x i32> %offsets to <vscale x 8 x i64>
				%ptrs = getelementptr float, float* %base, <vscale x 8 x i64> %offsets.zext
				%vals = call <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*> %ptrs, i32 4, <vscale x 8 x i1> %mask, <vscale x 8 x float> undef)
				ret <vscale x 8 x float> %vals
				}

	; Code generate the worst case scenario when all vector types are legal.			; Code generate the worst case scenario when all vector types are legal.
	define <vscale x 16 x i8> @masked_gather_nxv16i8(i8* %base, <vscale x 16 x i8> %indices, <vscale x 16 x i1> %mask) {			define <vscale x 16 x i8> @masked_gather_nxv16i8(i8* %base, <vscale x 16 x i8> %indices, <vscale x 16 x i1> %mask) {
	; CHECK-LABEL: masked_gather_nxv16i8:			; CHECK-LABEL: masked_gather_nxv16i8:
	; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]			; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]
	; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]			; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]
	; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]			; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]
	; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]			; CHECK-DAG: ld1sb { {{z[0-9]+}}.s }, {{p[0-9]+}}/z, [x0, {{z[0-9]+}}.s, sxtw]
	; CHECK: ret			; CHECK: ret
	▲ Show 20 Lines • Show All 41 Lines • ▼ Show 20 Lines
	}			}

	declare <vscale x 2 x i8> @llvm.masked.gather.nxv2i8(<vscale x 2 x i8*>, i32, <vscale x 2 x i1>, <vscale x 2 x i8>)			declare <vscale x 2 x i8> @llvm.masked.gather.nxv2i8(<vscale x 2 x i8*>, i32, <vscale x 2 x i1>, <vscale x 2 x i8>)
	declare <vscale x 2 x i16> @llvm.masked.gather.nxv2i16(<vscale x 2 x i16*>, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)			declare <vscale x 2 x i16> @llvm.masked.gather.nxv2i16(<vscale x 2 x i16*>, i32, <vscale x 2 x i1>, <vscale x 2 x i16>)
	declare <vscale x 2 x i32> @llvm.masked.gather.nxv2i32(<vscale x 2 x i32*>, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)			declare <vscale x 2 x i32> @llvm.masked.gather.nxv2i32(<vscale x 2 x i32*>, i32, <vscale x 2 x i1>, <vscale x 2 x i32>)
	declare <vscale x 4 x i8> @llvm.masked.gather.nxv4i8(<vscale x 4 x i8*>, i32, <vscale x 4 x i1>, <vscale x 4 x i8>)			declare <vscale x 4 x i8> @llvm.masked.gather.nxv4i8(<vscale x 4 x i8*>, i32, <vscale x 4 x i1>, <vscale x 4 x i8>)
	declare <vscale x 16 x i8> @llvm.masked.gather.nxv16i8(<vscale x 16 x i8*>, i32, <vscale x 16 x i1>, <vscale x 16 x i8>)			declare <vscale x 16 x i8> @llvm.masked.gather.nxv16i8(<vscale x 16 x i8*>, i32, <vscale x 16 x i1>, <vscale x 16 x i8>)
	declare <vscale x 32 x i32> @llvm.masked.gather.nxv32i32(<vscale x 32 x i32*>, i32, <vscale x 32 x i1>, <vscale x 32 x i32>)			declare <vscale x 32 x i32> @llvm.masked.gather.nxv32i32(<vscale x 32 x i32*>, i32, <vscale x 32 x i1>, <vscale x 32 x i32>)

				declare <vscale x 4 x half> @llvm.masked.gather.nxv4f16(<vscale x 4 x half*>, i32, <vscale x 4 x i1>, <vscale x 4 x half>)
				declare <vscale x 8 x half> @llvm.masked.gather.nxv8f16(<vscale x 8 x half*>, i32, <vscale x 8 x i1>, <vscale x 8 x half>)
				declare <vscale x 8 x bfloat> @llvm.masked.gather.nxv8bf16(<vscale x 8 x bfloat*>, i32, <vscale x 8 x i1>, <vscale x 8 x bfloat>)
				declare <vscale x 2 x float> @llvm.masked.gather.nxv2f32(<vscale x 2 x float*>, i32, <vscale x 2 x i1>, <vscale x 2 x float>)
				declare <vscale x 8 x float> @llvm.masked.gather.nxv8f32(<vscale x 8 x float*>, i32, <vscale x 8 x i1>, <vscale x 8 x float>)
				declare <vscale x 4 x double> @llvm.masked.gather.nxv4f64(<vscale x 4 x double*>, i32, <vscale x 4 x i1>, <vscale x 4 x double>)
				attributes #0 = { "target-features"="+sve,+bf16" }